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ABSTRACT 


The problem of estimating state variables and parameters of 
dynamic systems in the presence of random disturbances and measurement 
noise is examined for a wide range of dynamic systems, including linear 
and nonlinear systems in discrete- and continuous-time. This is a funda- 
mental statistical problem arising in, but by no means limited to, the 
areas of adaptive and optimum control, Parameter estimation is treated 
as a special case of state variable estimation. 


The general solution of the linear problem is given. Approxi- 
mation techniques are developed for the nonlinear problem and the results 
of simulation studies demonstrating the application of these techniques 
are presented, 


A dynamic programming formulation of the estimation problem is 
developed. 


A discussion of the concepts of controllability and observability 
is given in which particular attention is paid to the problems peculiar 
to discrete-time systems, 

Background material on matrix identities, Gaussian random vectors, 


the generalized inverse of a matrix, and probability density functionals 
is included in a series of appendices, 
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INTRODUCTION 


The introduction of statistical considerations into a wide range 
of engineering problems is a logical consequence of the existence of un- 
certainties of one type or another which frequently make a purely deter- 
ministic approach to these problems unrealistic. The field of automatic 
control abounds in these uncertainties, such as those arising from ran- 
dom disturbances, measurement noise and changing environment. Often un- 
certainties about the process or plant being controlled and about future 
commands are also present, Hence, it is not surprising that statistical 
methods have contributed significantly to the development of modern con- 
trol theory. 

A fundamental problem in the areas of adaptive and optimum control 
is that of estimation of state variables and parameters of a dynamic sys- 
tem when random disturbances and measurement noise are present, If the 
estimates of these quantities are to be used for control purposes, then 
the estimation problem must be solved in real time, The problem then is 
to synthesize an estimator which will act on all available data and con- 
tanually produce up-to-date estimates of the quantities of interest, In 
related problems of ex post facto data analysis the severe real time re- 
quirement is relaxed and, at the expense of computation time, priority 
is given to processing the often limited amount of data available in an 
optimal manner, 

This report is devoted to the examination of this estimation prob- 
lem for a wide range of dynamic systems, Chapter I presents the mathe- 
matical model to be used in the continuous-time version of the problem 
and the generality of the model and its relation to other problems is 


discussed, Because the analogous discrete-time problem is conceptually 


Jo 





simpler, we first examine it in detail in Chapter II before returning 
to the continuous-time problem in Chapter III. Chapter IV discusses 
the concepts of controllability and observability. Chapter V gives 
the results of some computational studies: while several concluding re- 
marks and suggestions for possible further research are contained in 
Chapter VI. Material of a complementary and ipctenentoey nature is 
included in a series of appendices, 

The principal references for this report are the work of Kalman 
[29, 31, 32, 34] and the paper by Bryson and Frazier [10]. An effort 


has been made to use a notation which is compatible with these references, 


ee 





CHAPTER I 
BACKGROUND AND PERSPECTIVE 
head Introduction 


The purpose of this chapter is to lay the groundwork for the 
analysis of the estimation problem and to discuss various aspects of 
the mathematical models considered in the sequel. After taking care 
of some notational preliminaries and reviewing briefly the state 
approach to dynamic systems, the continuous-time model is presented. 
This model is then related to control problems and to classical linear 
filtering theory, The author chose the continuous-time model for this 
preliminary discussion because it is easily related to the model 


appearing most frequently in modern control theory. 
1.2 Notation 


For simplicity vector notation is used throughout this work. 
All vectors are column vectors, Vectors are designated by under- 
lined lower case letters and matrices by underlined upper case 
letters, For example, the vector x and the matrix A are to be in- 


terpreted as follows: 


12 in 
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The transpose of x and A are denoted by x! and A’ respectively. The 


inner product of two vectors x and y is written as a matrix product, 


= 


ig oe 


That is, 


The Euclidian norm of a vector x is denoted by | x ||. 


V x?’x = Vx + coo + x8 


ms 
I 


Also 


Ix de 


2 
ee 
If Ais a positive definite matrix, the following special notation is 
used for the associated quadratic form 


Z 
jx] = xt& = 2, a xx 
A is a eae 


If x is a function of time, then x(t) is the derivative of x with 


respect to time. That is, 


dx, (t)/dt 
dx, (+) /at 


° 


x(t) = dx(t)/at = | | 


dx, (t)/at 


The scalar function V(X) (tb), 0000%, (t)5t) is written V(x(t),t). 


The gradient of V is a vector denoted by Vie The partial derivative of 


thes 





V with respect to t is written Vio That is, 
ov/ OX, 


V, = ov(x(t),t)/at 


Vy. = grad V = ‘ 


° 
ov/ ox, 


Vector valued functions are designated by underlined lower case 


letters, For example, 


fi (x, 9000 ee 9°00 Ut) 


f(x,u,t) = ; 


© 


f(xy 9o0o0°o Xo Us 9000 ust) 


We use (A f(x, )/ax) to designate the Jacobian matrix of partial 
derivatives {of,/ Ox, | evaluated at x = x. 
The segment of the time-function u(t) on the closed interval 


t, t= t, is designated ee 


tort] ° 





The convention of describing a dynamic system by a set of first 


order ordinary differential(or difference) equations or, more compactly, 


by a vector differential (or difference) equation is already well estab- 


lished in the control literature, Introductory accounts of "state-space 


techniques", the name usually given to a body of concepts associated 


with this characterization, are given in references [7,19,68 |. However, 


Since vector notation will be used throughout this report, it is appro- 


priate to review briefly some of the basic ideas and relations. 
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Informally, we may say that the future behavior of a system 
depends upon its present state and any forcing functions or inputs which 
may be applied to the system in the future, We limit our discussion to 
systems in which the minimum number of quantities necessary to specify 
the state of the system at time t is finite, The members X4 (t),000,X,(t) 
of such a minimum finite set are called state variables, The state vari- 
ables are elements of the state vector x(t) which ranges over a set X, 
called the state space, Examples of appropriate sets of state variables 
are the amount of charge on each capacitor and the amount of current in 
each inductor in an electric circuit, the position and momentum of each 
mass of a mechanical system and the output of each integrator of an 
analogue computer representation of a system, 

The equations of state for a continuous-time system are usually 
written as a system of first order ordinary differential equations of the 


form 
x, (t) a foxy (t),0005x (t) ou, (t)soooou (t),t) Lea Tycoon 


where Uy scoool,, are the inputs of the system, The equivalent vector 


equation is 
x(t) = £(x(t),u(t),t) (Ve) 
The analogous equations for a discrete-time system are 


x, (k+1 ) = £5 (x, (Kk) po009X,(K) Uy Ck) sear »u,(k),K) i= Ty000—eN 


x(k+i) = £(x(k),u(k),k) Gig2) 


o6= 





While equations (1.1) and (1.2) encompass a very large class of systems, 
it is well to remember that they do not include distributed parameter 
systems described by partial differential equations, systems with time 
delays described by differential-difference equations and other systems 
of more complicated types. 

As an example of the reduction of an nth order nonlinear differential 
equation to a system of n first order differential equations, consider 


the equation 


y (4) + Gee oe) = 0 


yet), Then an equivalent system of 


Let xX; = Ys X= Vs 0 © © » X = 


first order equations is 
x, (t) fa x, (t) 
x, (t) = x, (t) 


° 
° 


° 


x4 (t) = x(t) 


x, (t) ae (x, oC 2 X,0Ust) 


Important special cases of (1.1) and (1.2) arise when the system 


is linear, For a linear system (1.1) becomes 
x(t) = F(t)x(t) + G(t)u(t) Gie3) 
One can easily verify that the solution of (1.3) has the following form; 


t 
x(t) = (t,t, )x(to) + J $(t,s)G(s)u(s)ds 
to 
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where $(t,t,) is the transition or fundamental matrix of the system (1.3). 
The transition matrix is the unique matrix satisfying the following rela- 


tions: 


St 8(tyt,) = F(t) B(t,t,) 
Orta aL 


where I is the identity matrix, For the special case of time-invariant 


linear systems, F is a constant matrix and the transition matrix is given 


by 
oe) = exp [ (tat JE | = - (tot,)"F? 
n=o n! 


As an example of the reduction of a linear differential equation 
to a system of first order equations of the form of (1.3), consider the 


third order time-invariant equation 
y (t) + a y(t) +a, y(t) +a, y(t) = bu(t) + b,u(t) + b,w(t) 


Let x, (t) = y(t). Then an equivalent set of first order equations written 


in vector form is 


| x, (t) -a, 1 0 x, (t) bo u(t) 
= X(t) | =] 9a, 0 1 X(t) | + | b, 
x, (t) =a, 0 0 X(t) db, 


The linear version of the discrete-time system (1.2) is 


x(k+i) = F(k)x(k) + G(k)u(k) (1.4) 


ers 





The solution of the discrete-time linear system takes the following form; 


m | 
x(n) = Saket) + 20 Kase )O()MG) mk 
j= 
where 
$(n,k) = F(n-1) $(n=1 ,k) = F(no1) °°° F(k) . n>k 
é(k,k) =I 


Procedures for reducing difference equations to a set of first order 
difference equations are analogous with the ones illustrated for differ. 
ential equations, For a more detailed discussion of linear systems from 


the state point of view see [33]. 
1.4 Mathematical Model 


We begin our discussion with a mathematical statement of the 
continuous-time estimation problem, The motivation for the mathematical 
model used will become apparent in the discussion immediately following. 
The reader may wish to read over this section lightly at first, returning 
after the relation of the model to various problems is developed. 

The problem to be considered is the estimation of the state 
variables of a nonlinear dynamic system which is excited by white 
Gaussian noise, It is assumed that a nonlinear combination of the state 
variables corrupted by additive white Gaussian noise may be observed 
over a finite time interval. 


In particular, we consider processes which may be described by a 





system of nonlinear differential equations which can be written in 


vector notation as 
x = £(x(t),t) + G(t)w(t) (1.5) 


The observed signal is 


a(t) = h(x(t),t) + v(t) (1.6) 
where : 

x 1S an nevector, the state vector of the system 

wis an m=-vector, m =n, the random input 

Z is a p-vector, the observation 

Gis ann xm matrix 

v is a p-vector, the measurement noise 


The functions w and v are white Gaussian noise processes with zero 


means and the following covariance matrices; 


E[w(t)w'(7)] = Q(t) S(te7) 
El y(t)y'(7)] = R(t) S(t-7) for all t and? 
E[w(t)v'(7)} = o 


where EK is the expectation operator, éis the Dirac delta function, 
and Q(t) and R(t) are positive definite symmetric matrices which are 
continuously differentiable in t, It will be necessary to assume that 
£ and h satisfy certain differentiability conditions with respect to 
their arguments, 

This model may be represented by the block diagram of Fig.1. 


The wide arrows are used to indicate the flow of vector quantities, 


=1Qu 





The output of an oval block is obtained by the indicated nonlinear 


operation on its input. 


measurement noise v(t) 


random 
input 






observations 


£(x(t),t) 





Fig. 1. Model of the Continuous-Time System 


Observations begin at an initial time t,. The problem of real 
time estimation of state variables is to use the history of observations 
from t, up to the present time t (1.6, Z 

=[to st 
up-to-date estimates of the present state x(t). 


1) to continually produce 


It will be convenient to imbed this problem in the more general 


problem of estimating the entire segment x o this point will 
=[to,t + T] 


be discussed in subsequent chapters, 
1.42 


Relation to Optimum Control Problems 





The so-called optimum control problem is usually formulated 
in the following manner, The physical process to be controlled is 


represented by a system of ordinary differential equations of the form 
dx, (t)/dt = £5 (x ,000sX poly po00sU net) oh = 1,000 5M 


where X4y0000X, are the state variables of the process, Uy peoo ely, 


elle 





are the control variables and t is time, These equations are usually 


written in vector notation as 
x(t) = £(x(t),u(t),t) 


The control problem is that of choosing u(t) from a set of 
admissible controls so that the state of the system follows an optimal 
trajectory. Optimality is usually defined as the minimization of a 
scalar performance index which may in general depend on u, x and t, 

For example, if it were desired to move the system from an initial state 
X, to a specified final state x,, the performance index might include 
time of transit, energy expended, or a combination of these and other 
criteria. 

The practical considerations of model inaccuracies and external 
disturbances make it desirable to have a closed loop system, that is, 
one in which the control u(t) is specified as a function of the state 
x(t) rather than as a preprogrammed function of time, 

Despite its short history, the literature devoted to various 
aspects of this problem is voluminous, Various mathematical techniques 
including the caleulus of variations [14,18,36] , the "Maximum Principle" 
of Pontryagin [52,55,41|, Bellman's dynamic programming [3,516] , and 
steepest descent [9,35] have been used in connection with this class of 
problems, In almost all of this work it is necessary to assume that the 
state variables of the system are observable or exactly measurable or 
may be calculated instantaneously from other observable quantities, Be- 
cause of external disturbances, instrument noise and the intrinsic diffi- 
culty of physically performing the unbounded operation of differentiation, 


the assumption that the state variables of the system are known exactly 


=12-= 





is almost never justified, 

It is known that for linear systems with quadratic performance 
criteria it is possible to solve the estimation problem and the optimi- 
zation problem separately and still obtain the over-all optimum system 
(see [23] or [28]). No analogous result can be anticipated for nonlinear 
systems; that is, when estimates are used in place of the actual values 
in nonlinear systems we cannot assume that the over-all system will still 
be optimal, From the viewpoint of application, however, if estimates of 
state variables were available, one would have no alternative but to use 
them, since the problem of joint estimation and optimization for a non- 
linear system belongs to a class of extremely difficult unsolved statis- 
tical optimization problems, Indeed, either the estimation problem or 
the optimization problem alone is extremely formidable, 

A diagram of the over-all control system using the estimates of 
the state variables is shown in Fig, 2. The estimator operates on all 


observable quantities, in this case u and z, to produce an estimate of 


random 
disturbances 





control input 












commands x(t) 


process 







controller 







measurements 
2(t) 










estimator 






Fig. 2. Block diagram of over-all control system 


aoe 





x which it feeds to the controller, The controller produces u which 
acts on the process, The value of u is also fed to the estimator. 


The mathematical model for this noisy system is 


x(t) £(x(t) u(t), t)+G(t)w(t) 


h(x(t),t) + v(t) 


z(t) 


This is identical with the model of (1.5) and (1.6) except that we have 
the additional known quantity u appearing in the first equation, In our 
model this is accounted for by the explicit dependence of f(x(t),t) in 
(1.5) on the time parameter t. 

There is no loss of generality in assuming w(t) to be white noise, 
It has béen emphasized in the early work of Bode and Shannon [8], and 
zadeh and Ragazzini [67 | that a Gaussian process with a rational spectrum 
may be considered equivalent to the output of a linear system excited by 
white Gaussian noise. If we denote the state variables of this equivalent 
linear system by x(2) and the state variables of the process to be cone 


trolled by x1), we may form a composite state vector 


x") 


x(2) 


Ie 


and obtain equations of the form of (1.5) and (1.6). 
A simple example may help to clarify this point. Consider a 


scalar process of the form 


x(t) = £(x(t),u(t)) + r(t) 


z(t) 


h(x(t),t) + v(t) 


= tHe 





where r is a Gaussian random process with the spectrum 


Ww 


Pp) : Ren 


a 
Then we may consider r as the output of the linear system 
r=exearrtwyw 


where w is a white Gaussian noise process with the spectrum 


$y () = NP 


If we designate x by x1) and r by (2) | we obtain the following system 


of equations; 


x!) (4) £(x')(&),u(t)) + x62) (4) 0 


d = + w(t) 
at x2) (4) 2 ax'=) (t) 1 


2(t) = h(x") (t),t) + v(t) 
which are of the form of (1.5) and (1.6). 
1.43 





Although there is no universally accepted definition of adaptive 
control, most investigators would agree that a central problem in the 
theory of adaptive control is the identification problem [46], Changing 
environmental conditions are frequently incorporated in the mathematical 


model of the system as changes of certain parameters in that model. The 


a1 5x 





identification problem is concerned with automatically evaluating current 
values of the parameters, Because these parameters usually enter the 
model multiplicatively, the problem of estimating them is often equivalent 
to that of estimating state variables of a nonlinear dynamic systen, 

Estimation of parameters will be treated as a special case of es- 
timation of state variables, The basic technique is to augment the 
state-space of the system to include the parameters themselves and addi- 
tional quantities to account for the nature of the statistical variations 
of the parameters, For example, an unknown constant parameter b would 
be handled by letting x,,; equal b and adjoining the equation x44 (t) = 0 
to the basic system of equations describing the process, A randomly varying 
parameter is treated as the output of a linear system excited by white 
noise, In this case it is necessary to adjoin the states of this fictitious 
linear system to the states of the basic process, 

Let us consider a simple example to illustrate these ideas. A 
problem of considerable importance is the control of systems with lightly 
damped mechanical resonances [46]. Frequently there is a problem of tuning 
the compensating network as the resonant frequency of the system drifts due 
to changing environmental conditions, such as are commonly experienced by 
high performance aircraft. For a second order system a suitable model 


for this situation might be 


e 
» 
° 


tohyt(e tec) y=utw, 


Qo 
=e 
fo 
© 


nN 
I 
ted 
ae 


Vv 


where the random parameter c is considered to be the output of a first 


order system excited by white noise, Letting 


=16= 





x, =y X5 = VY x, = ¢ 
we may rewrite these equations in the form of (1.5) and (1.6) as follows: 


x, Xp 0 0 W 


_d Xp = -b X, = (c. + x3) x, Fu + 1 0 W 


%, a. oO 1 


Z = % eV 
1 


Again the explicit dependence of f(x(t),t) in (1.5) on t accounts 


for any known inputs such as control forces or test signals, 


1,44 Relation to Linear Filtering Theory 


The application of statistical methods to automatic control 
problems began with the classic work of Wiener [65] on linear filtering 
and prediction theory. The early exposition by Bode and Shannon [8] 
presented the principal results of the Wiener theory in a form more 
easily understood by engineers, At about the same time, an extension of 
the Wiener theory which was particularly significant from a control point 
of view was made by Zadeh and Ragazzini [67]. Both of these early papers 
Be 67 | emphasized the viewpoint that stationary random processes with 
rational spectra could be thought of as the output of linear systems 
excited by white noise, Several investigators (41327 | have extended 
these ideas to multidimensional systems, The work of Davis [13] gives 
a general solution to the problem of factoring spectral matrices and 
relates the linear stationary theory to the state-space approach. Today 
material on the one-dimensional Wiener theory and its extensions is avail- 


able in a large number of standard textbooks: for example, [11,12,39,42, 
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49, Dt 60, ee 
The problem we consider is a nonlinear version of the growing 
memory filters discussed by Kalman and Bucy [34] who studied random 


processes described by the linear vector differential equations 
x(t) = F(t)x(t) + G(t)w(t) Gis7Z) 
a(t) = H(t)x(t) + v(t) (1.8) 


which are the linear form of (1.5) and (1.6). They give the general 
solution to the problem of linearly estimating and predicting x(t) 
based on a finite observation interval, If F, G, and H are constant 
matrices and the observation interval extends to negative infinity, this 
is a multivariate Wiener filtering problem in which the random process is 
specified by the linear system (1.7) and (1.8) rather than by its spectral 
matrix. For this stationary case, (1.7) and (1.8) may be obtained by 
factorization of the spectral matrix [13]. It is indeed unfortunate that 
the important results of Kalman and Bucy are not well understood by many 
control specialists, This is, no doubt, due in part to the elegant but 
intricate nature of their original derivation. An interesting by-product 
of our study of the nonlinear problem has been a simplified derivation 
of their results for the corresponding linear problem. (see Chapter III.) 
The idea of considering a random process to be the result of exciting 
a linear system with white noise has great intuitive appeal. This is es- 
pecially true for a stationary process where a suitable linear system may 
be postulated as the result of direct measurement of the spectrum of the 
process, For the time-varying system of Kalman and Bucy it is not at all 
clear how to make measurements on a non-stationary process to find a suite 


able system, However, their work has direct application to control problems 
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in which prior information about a time-varying system is available, 

There is considerable appeal to the idea of considering a 
stationary non-Gaussian random process to be the result of exciting 
with white noise a nonlinear time-invariant system having a finite number 
of state variables, Again it is not at all clear how to make measurements 
on the process to determine the appropriate system, Our work is applicable 


to cases in which prior information about the nonlinear system is available, 


1.45 Generality of the Model 


It has already been pointed out that any known inputs such as 
control forces or test signals are taken into account by the explicit 
dependence of f(x(t),t) on t. We also showed how estimation of parameters 
could be treated as a special case of estimation of state variables, 

There is no loss of generality in assuming Q(t) is positive 
definite because of the presence of G(t) in (1.5). The restriction that 
R(t) be positive definite is more basic, To relax this restriction for 
the continuous-time problem is to admit the possibility of differentia- 
ting the observed signal [29]. 


A more general version of (1.5) would be 
x(t) = £(x(t) w(t) ,t) (1.9) 


The basic objection to (1.9) is that extreme care must be taken when per- 
forming nonlinear operations on white noise because white noise is defined 
by a limiting process and the appropriate limit may fail to exist, For 
example, it is meaningless to square white noise, This objection may be 


avoided by considering the equation 


x(t) = £(x(t),t) + G(x(t),t)w(t) (1.10) 
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For simplicity, we shall, in the main, confine our attention to systems 
in which the white noise enters linearly as in (1.5), pointing out where 
extensions can be made to include systems of the type (1.10) and well de- 


fined versions of (1.9). 


1.5 An Intuitive Approach 


Before becoming involved in a maze of equations it is perhaps 
worthwhile to see in which direction an intuitive approach might lead. 

Consider the problem of estimating the state variables of a non- 
linear system for which all inputs are known (no random disturbances) 
and the observations at the output are noisy, A simple approach to this 
problem is to feed the known inputs to a model of the nonlinear system 
which hopefully, after some initial transients have subsided, will behave 
in the same manner as the system itself, The model would be constructed 


So that its state variables were easily measurable, Such a scheme is 


nonlinear 
systen 


Shown in Fig. 3. 







known inputs 
measurements 






Fig. 3. Open loop model approach 


We shall say that the scheme of Fig. 3 uses the model in an open loop 
manner, While in theory this approach should yield good results, the 
practical considerations of model inaccuracies and random disturbances 


would render it useless in all but the simplest of cases, 
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The obvious way to combat model inaccuracies and random dis-~ 


turbances is by the use of feedback. This is done in Fig. 4. 









random noise 
disturbances 
known inputs 
—— system 







data 
processing 






adjustments 







model outputs 


Fig. 4. Closed loop model approach 


The closed loop configuration of Fig. 4 has great intuitive appeal. 
While it is not clear how the error should be processed in order to 
properly adjust the model, we would expect that something could be 
worked out. It will be shown later that the model approach may be 
derived as the result of considering the basic equations (1.5) and (1.6) 
and that there is a rational way of processing the data, 

It is amazing how many seemingly diversified topics fall into 
the class of closed loop model schemes, Kalman and Bucy [34] show that 
the solution of the linear filtering problem may be interpretated in this 
manner, Davis [13] gives a model interpretation when the measurement 
noise is not white, The very general Wiener theory of nonlinear systems 
(46, 66 | assumes that nothing is known about the system and, using white 
Gaussian noise as the known test input, develops a model in terms of an 


orthogonal series, 
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A variant on this theme is the optimization of the system by ad~ 
justing it to behave more like the model, The recent work of Sakrison [| 56] 
and Kushner [38] in this area may be cited as examples, A large number 
of optimization and/or learning schemes using models are to be found in 
the adaptive control literature (43, 46], 

The various schemes using models differ in the amount of prior 
information assumed and the way in which observations are processed to 
make improvements on the model or system, In our work it is assumed that 
the system is represented by known equations of the type (1.5) and (1.6). 
In the preceding sections it was shown that many problems fall into 
this category, There are many situations, however, in which adequate 
knowledge of the physics of the system is not available and one is unable 
to write equations of the form of (1.5) and (1.6) even using the technique 
of treating unknown parameters as additional state variables, Our work 


is not directly applicable to these situations, 
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CHAPTER II 
ESTIMATION OF STATE VARIABLES FOR DISCRETE-TIME SYSTEMS 
2.1 Introduction 


This chapter is devoted to the problem of estimating state 
variables for discrete-time systems, This problem is completely anal- 
ogous to the continuous-time problem discussed in the preceding chapter, 
The discussion of the usefulness and limitations of the continuous-time 
model presented there applies also to the discrete-time model to be 
introduced forthwith. 

Discreteotime systems are usually approximations of continuous-time 
systems, They assume great practical importance because digital computers 
work in discrete-time and complicated problems usually must be solved on 
digital computers, The extent to which the ability to use digital com 
puters to obtain numerical solutions to complex problems has influenced 
modern technology is evidenced in the fact that today an effective come 
putational algorithm is often considered to be a solution to a problem 
which, because of the limitations of purely analytical techniques, would 
not even have been investigated a decade ago, The possibility of using 
Special purpose digital computers as control system components is to a 
large extent responsible for present trends in control theory, We shall 
continually be mindful of computational difficulties as the investigation 
of the estimation problem proceeds, 

The equations which characterize the remainder of this work seem 
quite formidable at first glance, The reader should take heart in that 
the underlying ideas are simple and most derivations involve nothing 


more complicated than differentiation, 
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Zee Problem Statement 


The problem to be considered is the estimation of the state 
variables of a discrete-time nonlinear system which is excited by a 
sequence of independent Gaussian random vectors, It is assumed that 
nonlinear combinations of the state variables corrupted by additive ine 


dependent Gaussian noise may be observed, 
In particular, we consider systems which are described by vector 


difference equations of the following form; 
x(k+1) = £(x(k),k) + G(k)w(k) (2.1) 


The observed signal is 


2(k) = h(x(k),k) + v(k) (2.2) 
where 

X 1s an n-vector, the state vector of the system 

w is an m-vector, m =n, the random input 

Z2 ais a pevector, the observation 


Le) 


isan xm matrix 


is a p-vector, the measurement noise 


i< 


h and f are vector-valued functions 
ooo pW(K) pW(kKt ) 000 and coo, w(k) w(ktt),.0. 
are Sequences of Gaussian random vectors with zero means and the following 


covariance matrices: 


E({w(j) w'(k)] 
Elv(j) v'(k)] 
Elw(j) v'(k)] = 0 


Q(x) 55k 
R(k) $ 5x for all intergers j and k 
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Q and R are positive definite matrices, Eis the expectation operator 
and 5 is the Kronecker delta, Again there is no loss of generality 
by assuming that Q is positive definite, The restriction that R be 
positive definite will later be relaxed, 

Having observed a finite sequence{ z(0),.000,2(n){, we may, in 
general, seek an estimate of an entire sequence of states | x(0),..0,x(n), 
oeeX(ntm)}. We shall call this the estimation problem, In this formula- 
tion we include as special cases the filtering problem, where an estimate 
of the current state x(n) is sought; the smoothing problem, where an es- 
timate of the sequence { x(o),...,x(n)} is of interest; and the prediction 
problem, where an estimate of a future state x(ntm) is desired, 

As in the continuqus-time problem, we may easily include as 
state variables unknown or randomly varying parameters, Any known 
forcing functions, such as control inputs or test signals are accounted 
for by the explicit dependence of f(x(k),k) on the time parameter k, 

Such known parameters will usually simplify the estimation problem since 
they provide additional information about the system behavior, 
The model for the discrete-jtime system may be represented by the 


block diagram of Fig. 5. 


measurement noise _v(k) 
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input 


w(k) + x(k+1 
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Fig. 5. Block diagram of the discrete-time system 
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Kalman {31] first solved the linear filtering and prediction 


problems for the linear version of (2.1) and (2.2) given by 
x(k+1) = F(k)x(k) + G(k)w(k) 
2(k) = H(k)x(k) + v(k) 


In recent independent work Ho [25] has used an approach similar 
to the one we shall use to re~Ssolve this linear problem, 
The systems described by (2.1) are a subset of the systems defined 


by the more general equation 
x(k+1) = £(x(k),w(k),k) 0) 


Extensions to include these more general systems will be pointed out, but 
it is more convenient to work with (2.1) as the basic equation. 

Finally, we should note that the time parameter k designates 
the time of the kth observation, There is no need to assume that obser. 
vations are equally spaced, There is no loss of generality in assuming 


that observations begin at k=o, 





uence of States 





Let us first examine the problem of estimating the sequence of 
states Oyen having observed the sequence { ZiCO)igre are ,z(n)} ‘ 
Proceeding in a straightforward manner, we consider the a posteriori 


probability density function 


Px zy (0) peo0rX()|2(0),000,2(n)) 


which is the conditional joint probability density function for the 


=26 





sequence of states { x(0),.0,x(n)} on the assumption that the observation 
sequence { z2(0),...,z(n)} has occurred, This function will be, in general, 


extremely complex, Using Bayes' Theorem, we obtain 


Pxat zy C0) pe009X(N)|2(0) ,000,2(n)) 


Pon xB 60 go000 »2(n)| x(o) 9000 un) Px, (x(0) 9000 »x(n)) 
Pz, (Z(0) @oaeszn)) (2,4) 
Let us consider the terms of interest which appear in the 


numerator of this expression, Using (2.2) and the independence of the 


measurement noise, we may rewrite the first term as 
al 
Po | xq (260) v0 00 sZ(n)|X(0),000,X(n)) = || py (z(k)-h(x(k),k)) 
N'i°N k=o k 


where Py, is the Gaussian density function for the additive measure. 


ment noise v(k). 


The second term may be rewritten in the following form; 
Px, (200) s000.x(n)) = Py(x(o)) ° py (x(1)] x(0)) ere 
Py (x(n) | x(n=1 ),.00,x(0)) 


The independence of successive values of w makes (2.1) a Markov pro- 


cess [26], That is, 


Py (x(k) x(k=1),000,X(0)) = py, (x(k)| x(k-1 )) 


Substituting into (2.4), we obtain 


eve 





Pry | 60) pocoyX(N)| 2(0),+00,2(n)) 


n n 


Pz, (2(0) seleteepouct) (2.5) 


We assign an a priori Gaussian distribution with mean m and co- 
variance matrix P(o) for the unknown initial state x(o). It will be 
convenient, although not essential, to assume P(o) is non-singular, This 
restriction will later be relaxed, 

Before proceeding further, it is necessary to examine the nature 
of the input sequence, Let r(k) = G(k)w(k). Then ...,r(k),r(kK+1),06. 
is a sequence of independent Gaussian random vectors with zero means, 

The covariance matrix for r(k) is G(k)Q(k)G'(k). If we designate the 


probability density function for r(k) by Pry and then use the fact that 
x(k) = £(x(k-1),k-1) + r(k-1) 
we obtain formally 


P (x(k) {x(k-1)) =p, (x(k) m£(x(k-1), k=1)) 
k=1 


If GQG' is singular, then r is confined with probability one'to a 
hyperplane in n-space of dimension equal to the rank of GQG', and 
without resorting to delta functions we are unable to write an explicit 
expression for the probability density function Pry We note, however, 
formally 


Preeti] 0S Joo. yy (20D) C4) P (w(k)) dw(k) (2.6) 
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where the conditional density function p 


may be interpreted as a 
Pel Wk 


constraint, since r(k) is known once w(k) is specified. 

If, on the other hand, the covariance matrix GQG' is non-singular, 
an explicit expression may be given for the Gaussian probability density 
function Pr: In this case all the terms in the numerator of (2.5) may 
be expressed explicitly since Pr Pys and Pp, are all Gaussian with known 


means and covariance matrices, Then, (255) becomes 


(x(0),.00,x(n)|2(0),eeez(n)) = C(z(0),«+s,2(n)) * 


een 
n z Za 
exp (= $ D5 ilz(k)-h(x(k),k) |, | = 2] x(o)=m |I__, 
k=0 Resi) P (0 


N= 4 


2 
>. x(k )-f(x(k),k) | 1 
[GQG"] 


ne 


(37) 


where C(z(0),.0.,z(n)) is a normalizing factor depending only on the 
observed output sequence and known constants, The norm notation has 
been used to emphasize the positiveness of the quadratic forms appearing 


in this expression, 
2.4 The Estimation Procedure 


Now that an expression for the a posteriori probability density 
function has been obtained for the case in which GQG' is non-singular, 
the nature of the difficulties introduced by nonlinearities is apparent. 
One might choose an estimate x so as to minimize the conditional expecta- 
tion of some appropriate loss function. That is 


Min E [ ie8)) | 


AN 
x X |Z 
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For a quadratic loss function defined by a positive definite matrix 
W, it is well known [29] that the best estimate is the conditional mean, 


To see this let 


2 
Ly (xR) = jx] = x! Wx = 2htWx + Kew 
W 
Then 
E [Ly(-k)] = x"Wx =~ 2x"Wx + x"Wx 
x|Z 


Differentiating with respect to x and setting the result equal to zero 
we find that the best estimate is x, the conditional mean. 

The task of finding the mean of (2.7) is overwhelming because of 
the nonlinearity of the system, 

It is also well known [54] that for a linear loss function of the 


form 


L, (x-x) S 2 Cc. [x5-%, | c,> 0 


the ith coordinate of the best estimate is the median of the marginal 


distribution for x., To see this note that 


i 
Bele.) |= Ss Cx I XteeX: 
X1Z gard) 4 iad 


and 


cD 


apn , A A 
x5 =, | = J (x, =x; ) pj (x, ) dx, +t (x,-%;) pi (x,) dx, 


Taking the derivative of this expression with respect to X, and setting 
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the result equal to zero, we obtain the desired result, 


A 


Xs oo 
J pi (xy) dx, = f Py (xy) dx, 
“00 Be 
ic 
Again the nonlinearity of the system is our nemisis if we try to apply 
this result to (2.7). 

An appealing loss function for many applications is one that 
weights equally all errors greater than some threshold value, This 
would express the idea that once the size of the error exceeded a spec. 
ified level, an increase in the size of the error would not be signifi- 


cant, An example of such a loss function is 


ll x-x | »for ix | <= c 


I= 


C » for Ix-x || > ¢ 


I= dd 


For this type of loss function we would expect that the mode 
would be a better estimate than either the mean or the median, To gain 
Some insight into this situation, consider the loss function and the proba- 


bility density function for a scalar random variable @ shown in Fig. 6, 


p(@) L(0-8) 


> 0 oe ee 


Fig. 6. <A probability density function and a loss function 
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It is clear that for the situation of Fig. 6 the expected loss 
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(area of the product) will be minimized by locating the valley of 
L(@-6) to correspond to the peak of p(@). That is, the mode would be 
the suitable estimate, 

It is known (see [29] or [58]) that if the distribution function 
ia () is unimodal and symmetric about its mean, then the conditional 
mean is the best estimate for a large class of symmetric loss functions 
including hy » Lp and L. above, We hasten to point out that in this case 
the mean and the mode coincide and one might just as well say that the 
mode were the best estimate, 

The estimate we shall use is the mode of the a posteriori dis- 
tribution. This is closely related to the method of maximum likelihood: 
the difference being that we take the Bayesian point of view of assigning 
an a priori distribution to the initial state, 


For GQG' non-singular, the estimate will be obtained by minimizing 


z n Z 
2d, = ||x(o)=m 5: | 2(k)=h(x(k),k) | 
x(o)=m Net (0) 2. Z h(x gt as 


no 


+ S* | x(k+1)-£(x(k),k) || 4 
k=o 


2 
[GQG" ] (2.8) 


with respect to the sequence { X(0) jo’meX (ni) » this is equivalent to 
minimizing with respect to the sequences | x(o),.o0,X(n)} and | w(o),cee, 


w(ne1 ) , the expression 


x(o)—m + k )eh(x(k),k + w(k 


subject to the constraint 


x(k+1) = £(x(k),k) + G(k)w(k) 





Introducing an n-vector of Lagrange multipliers \ to incorporate the 


constraint, we may minimize 


zZ 
I = 4 ||x(o)-n | 


n 2 
+ 5° 4{/2(k)-h(x(k),k) | 
: p'(o) =o as no! (k) 


n= 1 2 
+ SPS pwd) | + Ate) [x(t af (x(k) k)-G(K)w(k)] } (2.9) 
k=o Qk) 


From (2.6) we see that the function I, may also be minimized when GQG! 
is singular, 

The more general problem of estimating the state sequence 
| X(0) p0000X(N)5000.X(ntm) } after the sequence {2(0),e00,2(n)§ has been 


observed may be reduced by an analogous development to that of minimizing 


Z n Z 
I. =% {x(o)- + = | 2(k)-h(x(k),k 
na FE hxColm fy + Do Nec) Ty 


ntm=1 2 
+ Sf Siw) Pg + ANCc) [x(ett nf (Ce). -G (ied) 
k=o Qk) | 


(2.10) 


A comparison of (2.9) and (2.10) yields 


ntn=1 Z 
tapes Let >. a 14 |wCk) J}, + Atk) [x(k+1 )a£(2(k) sk) =G (eda («)]| 
k=n Orn) 
(2541) 
From (2.11) we conclude that I,m may be minimized by first mini- 
9 
mizing I, and then setting w(k) equal to zero for k=n,...,ntm-1. This 
leads us to the important conclusion that predictions are simply extrapola. 
tions of present estimates, Specifically, if we let x(k [n) denote the 


estimate of x(k) given { 2(0),000,2(n){ , then 
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x(n+1(n) = £(x(nin),n) 


x(n+2|n) = £(x(n+1|n), n+) 


°o 


@ 


x(ntm|n) = £(x(ntm=1]n), ntm-1) (2.12) 


Several investigators (see [13] and [29]) have pointed out that extra- 
polation of state variables is an interpretation of what the Wiener 
predictor does in the linear case, 

For the more general system (2.3), the basic process is still 


Markovian, In this case a modified version of (2.9) is obtained 


2 n 2 
I = = + + || z(k)xh(x(k),k 
n= F1xC)-a I y 22 Fi 2()-b(x(e) ) I ta 


Neo | Z 
+S SMe) Hg + ANC) [x(t af Cx(e) sw) e)]f (2613) 
i= go" (x) 


For this case (2.12) becomes 


x(n+11n) = f£(x(nin),0,n) (2.14) 
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Having reduced the estimation problem to a minimization problen, 
the possibility arises of proceeding sequentially by a dynamic pro- 


gramming formulation [3, 4, 5, 20] and at each step obtaining an estimate 


1 


of the present state, This can be done is either f exists or GQG' is 


non-singular, 


= | 


Case i, f exists for k=0,...,n=1 
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If f7' exists we have from (2.1) 
x (ket) = £7! (2 (kc) (het )w (et ) Ket ) (2.15) 


In order to establish a sequential procedure, we define the following 


scalar "cost" function; 


2 2 
V,(x(0)) = lx(o)=m || 7 + || z(0)-h(x(0),0) | 4 

P™' (0) R™ (0) 
v_( = Min TONE 
nx{n)) es ccalamit {3 O nl P71 (0) 


Ht Ck)-n(x(k),k) I pawl) I 
+ Ke k),kK + k ee etetote 
ga, | 2(k)-b(s alc” Beg PEM Ponti) 


Cale) 
subject to the constraint (2.15). The separability property of V, 


enables us to rewrite this relation in the following form 


2 
V,,(a(n)) = be 130) _ 
w(t) W(0),000,W(Nea2) ee 


n=1 | 1° NZ | i 
+ k)@h(x(k),k att k 
2, I 2(e)-(e() oi) ts) | w(k) ei of 


k=0 
n 


2 2 
+il2z(n)eh(n),n) |, +l wet) |, n=1 4.00 
R (n) Q (n-1) 


subject to the constraint (2.15). This is equivalent to 
Vn(x(n))= Min _( (x(n=1)) + |z(n)xh(x(n), n) | 
w(n-1 ) 
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again subject to (2.15). Finally, using (2.15), we obtain the functional 
equation 


V_(x(n)) = Min V4 (£7! (4(n)-G(n-1 )w(n-1) ,n-1)) 
w(ne1 


+ ll wot) | m + || 2(n)-b(x(n),n) - 
QQ (ne1) R™ (n) (2et7) 

Let ¢ be any particular value of x(n). We may interpret V,(c) 
as a measure of the unlikeliness of the most probable sequence of states 
{x (0),200,x(n)} in which x(n) takes on the particular value c, given the 
observed sequence { 2(0),.00,2(n)j and the a priori distribution for x(o). 
The estimate of x(n) is that value of x(n) for which V,(x(n)) is minimum, 
This corresponds to choosing the x(n) of the most probable (least unlikely) 
of all possible sequences of states given the observation and the a priori 
distribution. 

This procedure may be extended to the more general system (2.3) 
if x(n-1) is uniquely specified by x(n) and w(n-1). For such systems 


there will exist a relation of the form 
x(n=o1) = g(x(n),w(ne1 ),n-1) (2a) 


Using (2.18) in place of (2.15) in the preceding development, the follow- 


ing functional equation is obtained; 


V,(x(n)) = Min Vo1 (a(K(@) w(ne1 ) sn=1 )) 
w(n-1 ) 


Ze z 
+|w(n-1) |, +] 2(n)eh(x(n) on) || __, 
Q (n-1) R 


(n) (2.19) 


36m 





Case ii, G(k)Q(k)G'(k) is non-singular for k=0,...,Nn. 


Let us first consider the problem of predicting one step ahead, 


Then the function to be minimized is 


n Z Z 
Pies (k)=h(x(k),k) || + || x(k+ )-£(x(k),k) 7 
a Bey { HecBetdtd My +H eH £6000) Haat 


2 
+ || x(o)=m || _, 
iE 
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We define a "cost" function as follows; 


Z 
W.(x(o)) = (0)= 
o(z(0)) : | x0 BM =t(9) 


Z n 2 
W  (x(n+1)) = Min (0 )= + (k)#h(x(k),k) | 
nt x(n i i oe a 2 || 2 J=h(x(k) pare 
2 
+ | x(k+1 Jo£(x(k) ,k) | =| }} n=1 $i do (2.20) 
(90a 


Using the separability property as before, this relation may be ree 


written in the following form; 


Wa; (X(n+1 )) = Min 


2 Neo J 2 
Min (0) = + k oh (x(k) ,k)| 
x(n) 88, aon : al (o) zat sree Met 


2 
+ || x(k )-£(x(k),k) | ot J+ | 2(n)-h(x(n),n) || al 
aR n 


Z 

[eQc*] 
Z 

+ || x(n+1 )-£(x(n),n) || a 
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Using the definition of W(x), we obtain the following functional 
equation; 


aga} 


2 
Wig (X(n1)) = Min (Wi (x(n)) + || x(nt )-£(x(m),n) | 4 
x(n) {ge 


+ || 2(n)-h(x(n),n) | a 
1 (G20, (2A) 
We may give W, +4 (x(n+1 Dy an interpretation similar to the one given 
to V,,(x(n)) in case i, Let c be any particular value of x(nt+1). Then 
W,4;(¢) is a measure of the unlikeliness of the most probable sequence 
of states | x(0),.00,x(n+1 ){ in which x(n+1) takes on the particular value 
c, given the observed sequence { z(0),..«,z(n){ and the a priori distribution 
for x(o). The estimate of x(n+1) based on the observed sequence { 2(0),000, 
2(n) | is that value of x(n+1) for which Wa + (x(n+1)) is minimun, 
The estimate of x(n) based on { 2(0),..0,z(n){ is the value of 


x(n) for which the following function is minimum; 


V,(e(m)) = WiG@(n)) + || z(@)-h(a),n) || rs 
, R (n) (2522) 

where V,,(x(n)) has the same interpretation as in case i, Since the value 

of x(n) for which W,, (x(n)) is minimum is the predicted value of x(n) given 

{2(0),cc0,Z(ne=1)}, we may give (2.22) the following simple intuitive inter. 

pretation, The new observation z(n) is weighted according to its reliability, 

Ro » and used to modify the predicted value of x(n) to obtain the current 


estimate of x(n), The degree of uncertainty of prior knowledge is reflected 


in the sensitivity of V,(x(n)) to changes in x(n), 
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If the basic system were linear, either of the two functional 
equations, (2.17) or (2.21), could be solved analytically by assuming 
a solution of the following form; 

P 2 
| x(n)-x(n) || + c(n) 
A(n) 
where A(n) is positive definite and is the inverse of the covariance 
matrix of the conditional distribution for x(n) given the observations, 
and where x(n) is the estimate and c(n) is a scalar, The recurrence re- 
lations obtained by Kalman [31] for the estimate and the covariance matrix 
can be obtained in this manner, but the algebra is quite involved, For 
a linear system, as we shall see shortly, an analytic solution may be 
obtained by other methods when the restrictions of cases i and ii above 
do not hold. In the literature [29, 31, 32, 25] it has been assumed that 
the transition matrix F(k) is non-singular which corresponds to case i. 

It is perhaps worthwhile to review briefly how the computation 
would proceed using the functional equations, For greater detail the 
reader is referred to [3, 4, 5]. 

For (2,17) and (2.19) w is said to be the decision vector, This 
is because an appropriate "decision", w(n-1), may be associated with each 
possible state x(n) in such a manner that (2.17) or (2.19) is satisfied. 
For (2.21) the situation is slightly different in that x is both the 
decision vector and the state vector, From (2.21) we see that with each 
possible state x(n) we may associate an appropriate decision x(n-1), As 
the computation proceeds at each step the value of the "cost" function 
and the appropriate value of the decision vector may be calculated for 
each possible value of the state vector, An estimate is obtained by 
choosing the value of the state vector for which the "cost" function is 


minimun, 


Boos 





If one is interested only in estimating present state variables 
or predicting future state variables, there is no need to store old 
values of the decision vector for each possible value of the state 
vector. This is different from most dynamic programming problems, 

If it is also of interest to improve the original estimates of 
past states after the entire observation sequence has been observed, then 
the sequence of appropriate decisions for each possible state must be 
stored. Trading time for memory, this may be done on tape. The revised 
estimate of x(ke1) is specified by the decision vector associated with 
the revised estimate of x(k). 

The importance of the dynamic programming formulation of the ese 
timation problem lies in the possibility of processing each new observation 
as it occurs, Herein there is hope of solving the estimation problem in 
real time, Modern computers have adequate fast memory to handle second 
order systems in this manner, For higher order systems approximation 
techniques, such as polynomial approximations, may be used in conjunction 
with the functional equations. Computational aspects of dynamic programming 


are discussed in | 5]. 





26 _Two-Point Boundary Value Problems 


Let us now consider the problem of minimizing (2.9) which is 


rewritten here for convenience , 


2 n ‘ 
n= 2 1x(0)-n I e 2 ll2()-B@) 1) I (x) 


Neo | a 
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Setting equal to zero the partial derivatives of I, with respect 
to w(k), \(k), (for k=0,.00,n=1), and x(k) (for k=0,0e00,n) yields the 


following equations which x(kIn) must satisfy; 


x(k+1|n) = £(x(kln),k) + G(k)w(k) k=Oine een tie | (2.23) 
w(k) = Q(k)G" (k)A(k) i=o4 yon (2.24) 
A(ke1) = ( af" (R(kIn),n)/ 9x) A(k) + (ah! (x(k[n),k)/2 x) 

R°'(k) [ 2(k) = h(&(k|n).n)] =1,.005n (2.25) 
A(n) = 9 (2,26) 


R(o|n) =m + P(o) (dh'(X(o|n),0)/3x) R°'(0) [z(o)-h(X(oln),o) | 
+ P(o) (af'(X(oln),0)/ax) A(o) (2.27) 


Combining (2.23) and (2.24) to eliminate w, we obtain necessary conditions 


in the form of the following two-point boundary value problem: 
x(k#1 In) = £(x(kIn),k) + G(k)Q(k)G!(k) A(k) k=0,000,n~1 (2.28) 


The equation for A is (2.25) and the boundary conditions are given by 
(2.26) and (2.27). The more general problem of minimizing (2.13) yields 


the following similar two-point boundary value problem: 
x(kH|n) = £(&(k|n),w(k),k) K=O) 000 ,Nel (2.29) 
wk) = Q(k) (o£ (x(kIn) ,w(k))/2w) A(k) K=050$e,N=1 (2.30) 


The remaining equations are identical to (2.25), (2.26) and (2.27) except 
that (af(K(kin),k)/ dx) is replaced by ( af (X(k|n) ,w(k),k)/ 3x) in (2,25) 


and (2.27) © 
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These two~point boundary value problems are discrete analogues 


of the one obtained by Bryson and Frazier [10]. 
Z2o7___the Linear Problem 
The linear version of the basic system is 
x(k+1) = F(k)x(k) + G(k)w(k) (2701) 
2(k) = H(k)x(k) + v(k) (2.32) 
In this special case the two=point boundary value problem is 
x(kH|n) = F(k)x(kin) + G()Q(ie)G" () d(x) K=0,e00 Mel (2.33) 
\(ket) = F(k) M(e) + HY(k)R"'(k) [ 2(k)-H(k)R(kin)] k=0,000,n~1 (2.34) 


with the boundary conditions 


X(oln) =m + P(o)H'(o)R™ (0) [ z(o)eH(o)X(oln)] + P(o)F'(o) A(o) 
(25) 
An) = © (2.36) 
Zell Preliminary Discussion 
Before presenting the general solution to this problem, we 
shall first consider the simple cases n=o and n=1 in order to develop 
some feeling for the nature of the solution, 
For n=0, (2.35) and (2.36) may be combined to give 
x(olo) = m + P(o)H'(o)R™! (0) [ 2(0)=H(o)x(olo) | (2237) 


or 


abun 





oe P(o)H'(0)R” (o)H(o) ] X(olo) = m + P(o)H'(o)R™! (0)z(0) 


The appropriate inverse will always exist for R(o) positive definite, 


Thus, 

A | = =| 

x(olo) = [ I + P(o)H*(o)R” (o)H(o) | [ m + P(o)H'(0)R™ (0)z(0) | 
This may be rewritten in the more convenient form 


X(olo) =m + [ I + P(o)H'(o)R™! (0) H(o) 17'P(o) Ht (o)R™ (0) [ 2(0)-H(o)m | 


(2.38) 
Let 
(0) = [1 + B(o)H*(0)B™"(o)H(0) ] ~'B(0) (2.39) 
Then (2.38) becomes 
£(olo) = m + G(o)H'(0)B™ (0) [ z(o)-H(o)m ] (2.40) 


A comparison of the boundary condition (2.35) and (2.37) shows that 


(2.35) may be reduced by a similar procedure to the following form; 
x(o[n) = x(o]o) + C(o)F'(o)d(o) (2.141) 


The relation (2.41) is the key to the general solution, as we shall see 
shortly. 


For the case n=1, (2.33), (2.34) and (2.41) become respectively, 


x(1|1) = F(o)x(ol1) + G(0)Q(0)G'(0)A(o) (2,42) 
do) = H'(1)R-N1) [ 2(1)— BRC11) (2.43) 
X(ol1) = x(olo) + C(o)F*(0))(o) (2,4) 
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The relations (2.42) and (2.44) may be combined as 

£(1|1) = F(o)X(olo) +[F(o)C(o)F'(o) + G(0)Q(0)G"(o)} d(o) (2.45) 
Let 

P(1) = F(o)C(o)F'(o) + G(o)Q(0)G' (0) (2.46) 
Using (2.46) and substituting for )(o) from (2.43), we rewrite (2.45) as, 

£111) = F(o)k(olo) + PU)HN(A)R™' (1) Lat) = BR(1I1)) (2.47) 


Let us note that F(o)x(ol 0) = x(1 lo) by (2.12), and compare (2.47) with 
(2.37). We come to the immediate conclusion that a procedure similar to 


the one used for n=o will yield an equation similar to (2.40). Let 

c(i) = [ L + PC)HN(1)R"' (181) 7] 7TP(1) (2,48) 
Then the equation corresponding to (2.40) is 

x(111) = (110) + o(1)Ht(1)B') [ 21 =H )Z(110)] (2.49) 


Equations (2.41), (2.46), (2.48) and (2.49) will now be shown to 


be similar in form to the solution for arbitrary n. 





_solution to the Linear Two=Point Boundary Value Problem 


Theorem: Consider the linear two-point boundary value problem given 
by (2.33), (2.34), (2.35) and (2.36) where P(o) and R(k) are positive 


definite for k=o,...,n. Then x(kin) satisfies the relation 
X(kln) = X(klk) + C(k)F! (k)A(K) K=O, 000 9M (2.50) 


where x(olo) is given by (2.40) and 


ody. 





c(k) = [1 + P(k)H'(k)R™! (k)B(«) ] ~'P(k) (2.51) 
P(k+1) = F(k)C(k)F'(k) + G(k)Q(k)G' (k) (2052) 
Rklk) = F(ke1)x(ke1 | ket) + C(k)H" (k)R™! (1) 


[ 2(k)-H(k)F(k=1 )x(ke1| kot ) | k=l e Cs on (2.53) 


To establish a proof we proceed inductively showing that if (2.50) 
is satisfied by x(kin), then it is also satisfied for x(k+1|n). 
Proof: Suppose that X(kin) satisfies (2.50) for some k € (O,e0e,n~1). 


Then, by (2.33), 
x(kH1 In) = F(k)X(klk) + [ F(k)G(K)F'(k) + G(k)Q()G"(k)] ACK) 
Substituting P(k+1) from (2,52) and using (2.34), we obtain 
<(kH1]n) = F(k)R(klk) + P(k+1 HY (iH! RT! (#1) [ 2 (ett -H(ieH RCH [n)] 
+ P(k+ )F(k+1 ) \(k+1 ) 
or 
ieee (ich HS (4 Re (k+1 )H(k+1 )] % (c+ | n) = F(k)X(k! k) 
+ P(k+1 )H! (k+1 Ro (k+1 )z(k+1) + P(k+1 )F! (k+1 )\(k+1 ) 


The needed inverse will always exist, After some manipulation, this yields 


the following equation; 
x(kH [n) = F(k)R(klk) + OCH HY (iH R71 (tt) [Dn (et) 
- H(k+H)F(kK)X(k/k) | + C(k+ EF! (e+ )) (k+1) 


Using (2.53), we obtain the desired result; 


mies 





x(k#H|n) = X(kH |k+1) + C(k+1 )F! (KH )A(kH ) 


To complete the proof we note that, by (2.41), the hypothesis is satisfied 


for x(oln). 


Discussion: Note that no assumption is made that F(k) is non-singular. 
The recurrence relations (2.51), (2.52) and (2.53) were first obtained by 
Kalman [31] in a somewhat different form and using a very different approach, 


Using the following matrix identities derived in Appendix A; 
= pH [ HPH' +R ]7' 
(ro +pur-’'n 7’ p = p-pH! [HPH'+R]7' HP 


we may combine (2.51) and (2.52) and rewrite (2.53) to obtain equations 


more closely resembling Kalman's; 
X(k|k) = K(klket) + P(k)H' (ik) [H(k)P(k)H! (k) + R(k)] 7 
[ 2(k)=H(k)x(k| ke1 ) ] (2.54) 
P(k#H) = F(k) { P(k)=P(k)H' (kk) [H(k)P()H"(k) + RC) ] 7! H(k)P Ck) } FY Ck) 
+ G(k)Q(k)G! (k) (2.55) 


For the linear system, (2.31) and (2.32), all probability distribu- 
tions remain Gaussian, P(k) is the covariance matrix of the distribution 
for x(k) given} 2(0),.00,2(ke1){ and C(k) is the covariance matrix of the 
distribution for x(k) given { 2(0),.00,z(K){. (see Appendix D) 

The solution of the problem of estimating the present state of the 
system is given by (2.51), (2.52) and (2.53), or equivalently by (2.51), 


(2.54) and (2.55). In practice the decision to use either (2,52) and 
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Fig. 7 Noisy linear system and optimal estimator 
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(2,53) or the equivalent relations (2.54) and (2.55) would be made to 
minimize the dimension of the matrix to be inverted, A block diagram 

of the system and the estimator is given in Fig. 7. The optimal estimator 
is linear and consists of a model of the system in a feedback loop. Com- 


pare Fig o 7 with Fig. 4, 


If the estimate, %(k|n) is desired, one could proceed as follows; 

1. Compute and save X(kIk) and C(k) for k=0,...,n using (2.51), 
(2.52) and (2.53) or, equivalently (2.51), (2.54) and (2.55). 

2, Calculate X¥(kin) by backward recursion of (2.34) and (2.50). 
This procedure begins by calculating M(t ) using (2.34). 
Then obtain X(n-1/n) using (2.50), Use (2.34) to obtain 
\Mna2) and then (2.50) to obtain x(n-2/n), Continue this 
alternating procedure until x(o|n) is obtained, 

An alternate procedure would be to combine (2.50) and (2.34) to 


obtain the following equation for A; 
(ket) = [IZ = HY (c)R7! (k)H(k)G(k) J BN(k) Mk) 
+ HY (k)R™' (kc) [ 2 (ke) =H (ie) &(Ke1)] (2.56) 
The following matrix identities can easily be derived by the 


methods of Appendix A; 


L-H'rty [r+parRa]~'p = [2 +HR ‘HP 7 


L~H' [B+ HpH'] ~'HP 


Using these identities, (2.56) may be rewritten in different forms, 


Then step 2 above can be replaced by the following; 
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2. Calculate )(k) by backward recursion of a convenient 
form of (2.56). Then calculate x(k] n) using (2.50). 
By repeated use of matrix identities (2.56) may be written in 


the following form which later will be compared with other results; 
AQet) ={Z = H'(k) [R(k) + H(c)P (kB! (k) 7! B(K)B(K) J EM(k)A(K) 
+ H'(k) [H(k)P(k)H'(k) + R(k) ] 7! [ 2(k)oH(k)R(kl ke1)] (2.57) 


The solution of the linear smoothing problem may be given in an 
even simpler form if we write the equations in terms of X(k|ke=1). Later 
when we discuss the application of linearization techniques to nonlinear 
estimation problems, the formulas we shall obtain will be analogous to 
the ones already given for the linear problem, since we shall wish to 
linearize about the point X(kik) rather than the point X(klk-1). 

Combining (2.50) and (2.54) using the matrix identities, we obtain 


the following equation; 
X(kIn) = X(klket) + P(c)H* (k) [ H(k)P(k)H"(k) + R(k)] 7" 
[2(k)-H(k)K(kike1) J+ p(k){ I = HY (k)[ H(k)P(k)H' (k) + R(k))7" 
H(k)P(k)} E'(k) (ks) 
Combining this relation with (2.57) yields 
X(kIn) = X(k|ko1) + P(k)A(ke1) 
We may summarize this result in the following corollary: 


Corollary: The solution of the linear smoothing problem is given by 


the following equations; 


HQ 





X(kIn) = X(k/k-1) + P(k)A(k-1) 
X(kH 1k) = F(k)x(klkeo1) + F(k)P(k)H' (k)[ H(k)P(k)H'(k) + R(k)} 7! 
[ 2(k) - H(k)X(kI k-1) ] 


where P(k) is given by (2.55) and A(k-1) is given by (2.57) for k=0,.e0,N. 
The a priori mean m is used for X(o |~1) in the above expressions and 


Mn) = 06 


2.73 ‘The Existence of[I + PH'R7!H i 





In the following discussion we shall use some matrix identities 


and the following elementary facts from matrix theory developed in Appendix A: 


i. The inverse of a positive definite matrix is positive 
definite, 

di. The matrix formed by adding a positive definite matrix 
to a nonenegative definite matrix is positive definite 
and hence, has an inverse, 

One may verify by direct multiplication that the unique inverse 


of I + PHIRT'H is 
I - PHY [ HPH'+R]~' H (2.58) 


If Ris positive definite and P is nonenegative definite, (2.58) will 
always exist by virtue of ii. Using (2.58), G(k) may be written in the 


following form; 


C(k) = P(k) = P(k)H'(k) [ H(k)P(k)H'(k) + R(k) | a H(k)P(k) (2.59) 


The relation (2.59) is a generalization of a lemma given by Ho [ 25], who 
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required that P(k) be positive definite, 

While the inverse of a matrix is unique, it frequently may be 
written in various forms which bear little resemblance to each other. If 
P(k) is positive definite for example, the inverse of [ Ee PH'R™'H ] may 


be written in the following form; 


[es ee Et Ry Eee (2,60) 


which surprisingly is identical to (2.58). Using (2.60), we may write 


(2.51) in the following form; 
otk) = [ Pv (ke) + HY (k)R°! (k)H(k) ) 7" (2.61) 


From (2.61) and ii it follows that if P(k) is positive definite, then C(k) 
is positive definite, This is in accord with the fact that P(k) and C(k) 
are the covariance matrices of the probability distributions for x(k) 
given | 2(0),.00,z(ke1) { and {2(0),...,z(k) | respectively, A singular co- 
variance matrix signifies that x(k) is confined with probability one to a 
hyperplane in n-space, It is clear that this cannot happen as the result 
of measurements which are corrupted with noise having a non-singular co- 
variance matrix R(k). 

Consider (2.52), which we rewrite for convenience, 

P(k+H1) = F(k)C(k)F'(k) + G(k)Q(k)G! (k) (2.52) 
From (2.52) we see that a sufficient condition for P(k+1) to be positive 
definite is that either G(k)Q(k)G'(k) be positive definite or C(k) be 
positive definite and F(k) be non-singular, This may easily be related 


to the basic equation for the system, 


x(k+1) = F(k)x(k) + G(k)w(k) (2,31) 
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From (2,31) we see that P(k+1) will be positive definite if every point 
in nespace may be Pencied in one transition from some point in the set 
of possible values of x(k) (specified by C(k) ) using some w(k). Thus, 
the condition obtained from (2.52) is seen to be sufficient but not nec. 
essary. 

Finally, we note that if P(o), R(k) and F(k) are non-singular for 


all k, then P(k) and C(k) are also positive definite for all k. 


2.74 Generalizations 


The restrictions that P(o) and R(k) be positive definite may be 
relaxed without altering the basic structure of the solution. In order 
to relax these restrictions we use the generalized inverse discussed in 
Appendix B, Appropriate background material concerning probabilistic 
aspects of singular covariance matrices is given in Appendix C. In this 
section we shall simply state the results obtained in Appendix D and com- 
pare them with those already obtained in this chapter. For a more detailed 
discussion the reader is referred to Appendix D, which considers a very 
general version of the linear estimation problem in which correlation 
between v and wis allowed and the covariance matrices P(o) and R(k) may 
be singular, 

The solution of the linear estimation problem is given by the 
following equations which are the same as (2.50), (2.59), (2.52), (2.54) 
and (2,57) except that in (2.59), (2.54) and (2.57) the inverse is ree 


placed by the generalized inverse: 


x(k|n) = x(k]k) + C(k)F! (k) \(k) K=0 erent (2.50) 
G(k) = P(k)-P(k)H'(k) [ H(k)P(k)H'(k) + R(k)] *H(k)P(k) (2,62) 
P(k+1) = F(k)C(k)F'(k) + G(k)Q(k)G! (k) (2.52) 
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X(k|k) = X(kike1) + P(k)H'(k) [ H(k)P(k)H'(k) + R(k) | i 
[ 2(k)-H(k)X(k1ke1 ) ] K=1 ,e00eN (2.63) 
(ket) =f I = H'(k)[ R(k) + H(k)P(k)H! (x) J# H(k)P(k) { F#(k)A(k) 


+ H'(k) [ R(k) + H(k)P(K)H' (k)]® [ 2(k)-H(k)E(kel ket ) } 
k=. een (2,64) 


The boundary conditions are 


m + P(o)H'(o)[ H(o)P(o)H'(o) + R(o)]* [ z(o)-- H(o)m] (2.65) 


x(olo) 


A(n) = 


jo 


The generalized inverse may be replaced by any pseudowjinverse, The 
pseudo-{inverse was used by Kalman [29 ]in his solution of the linear fil- 
tering and prediction problem, We have extended this work to include the 
smoothing problem and have relaxed the restriction that F(k) be nonesingular. 
The only restrictions remaining are that R(k) and P(o) be valid covariance 
matrices and that observations do not occur which are inconsistent with 
prior knowledge. This point is discussed in Appendix C. The block diagram 
of Fig. 7, in which the estimator producing up-to-date estimates of present 
state variables consists of a model of the system in a linear feedback 
loop, remains valid if C(k)H! (k)R™ (k) is replaced by 


P(k)H* (k) [ H(c)B Ck)" (k) + Rc) ] # 
These two expressions are equivalent if RT! CXiSts . 


If one is interested in the linear smoothing problem, the corollary 
of section 2.72 remains valid if the inverse in the equation for x(k+#1|k) 


is replaced by the generalized inverse, 
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2.8 An Approximation Technique 


2.91 Linear Observations 

There is a great need for approximation techniques which can be 
used to obtain estimates in real time, In this section we shall present 
a method which is computationally attractive and which possesses some 
intuitive appeal, 

For simplicity we shall first assume that h(x) is linear, an 
assumption which will appreciably simplify the resulting equations and 
which represents a case of great practical significance, In this case we 
say that the observations are linear, Consider the approximation to (2.1) 


obtained by the following linearization; 
x(k+1) = £(x"(klk),k) + (2 £(x"(kIk),k)/3 x) [x(k)-x"(kIk)] + G(k)w(k) (2.66) 


The point x"(klk) about which the linearization is made is the estimate 
of x(k) given { 2(0),000,2(k)}. We use the asterisk to remind ourselves 
that the estimates produced by this linearization procedure are only 
approximations and do not necessarily correspond exactly to the mode of 
the a posteriori distribution. 


If the function I, of (2.9) is replaced by 


2 n Z 
I” = 4||x(o)= + 4 || 2(k)oH(k)x(k 
n= $llx(o)-m lt ie 2, | 2 (=H (x(k) let i, 
Noi ; 2 i: 
+ oo { Slut | ntggy 72) [[ Get )-£(x" (elie), k)] 
= (3£(x*(clk),k)/2x) [x(cdex*(clk)] = GCe)w(e} (2.67) 


ote 


and the partial derivatives of Ta 


with respect to w(k), A(k) and 
x(k) are set equal to zero, a modified version of the two-point 


boundary value problem is obtained, The resulting 
a 5H 





equations are as follows; 
x (kHIn) = £(x"(klk),k) + (a£(x"(klk),k)/2 x) [ x" (k|n)=x" (klk) } 
+ G(k)Q(k)G! (k)A(k) (2.68) 


\Mke1) = (2£"(x" (klk), k)/2x) Mk) + BF ()R™ (k) [ 2(k)-H(k)x" (kin) J 


(2.69) 
The boundary conditions are 
Atm) = 9 (2.70) 
x (oln) =m + P(o)H! (o)R”' (0) [ 2(0)=H(o)x" (0) n) J 
+ P(o)(of! (x" (0 lo),0)/@x) do) Qi) 
A comparison of (2.71) and (2.35) indicates that (2.71) may be written 
in the following form which is similar to (2.41); 
x'(oln) = x"(olo) + 6°(0)(2£"(x (olo),0)/a x) Mo) 272) 


where x’ (0/0) and C”(o) are equal to X(olo) and C(o) respectively and are 
given by (2.40) and (2.39). It is not at all surprising that the solution 
of this linearized problem closely resembles that of the linear problem 


already studied, We define the following quantities; 
G"(k) = CZ + P'(k)Ht (kB! (i) ]~*P* (x) (2.73) 
P™ (eH) = ( a£(x (klk),k)/ 9x) O'(k) (aft (x (klk),k)/ax) 
+ G(k)Q(k)G!(k) (2.74) 


P(o) = P(o) (2675) 
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We now hypothesize that the x" (k|n) satisfying the two-point 
boundary value problem (2.68), (2.69), (2.70) and (2.71) is given by 


the following equations; 
x"(k|n) = x*(klk) + C"(k)(o £'(x (klk),k)/a x) M(k) k=0,000,n (2.76) 
where 
x"(k1k) = £(x" (ket [ket ),ke1) + 0"(k)H! (k)R7! (k) 
[ 2(k)-H(k)£(x" (ke1| ket) ket ) J I=1,000.n (2.77) 


and x" (oo) is given by (2.40). We again proceed by induction showing 
that if (2.76) is satisfied for some k € (0,...,n=-1) then it is also sate 
isfied for k+1., We know that (2.76) is satisfied for k=o. The steps in 
the following meee are completely analogous to those used to prove the 
theorem of section 2.72. 

Suppose that x" (k}n) satisfies (2.76) for some k € (0,c0ee,nea1)~ 


Then, by (2.68) 
x*(kHlIn) = £(x"(kIk),k) = (a£(x*(klk),k)/2x) x*(klk) 
+ (a£(x (kik),k)/2 x) x*(klk) 
+1 (@£(a* (klk) ,k)/Ox) O'(k) (2 £1 (x* (kik), k)/2 x) 
+ G(k)Q(k)G*(k)} ACK) 
or ae equivalently 
x*(kH In) = £(x (klk),k) +P (k+1) \(k) 


Substituting for \(k) from (2.69) this expression becomes 
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x* (cH jn) = £(x"(kik)»k) + P (lH )H! (eH )R™! (sett) 
[2(k+1) - H(k)x’ (K+ |n) J 
+ P (k#H)(2 £°(x (IH Iket),k#1)/3 x) ) (eH ) 
Solving this expression for x (k#1|n) yields 
x (k+ In) = [I a P (k+ )H' (k+1 )Rv! (k+1 )H(k+1 ) | = 
{ £(x"(kik),k) + P’ (ict! )H! (+t )RO! (+1 )z (e+) 
+ P (k+ (af! (x (ett lk+1),k+1)/Ox) d(k+1) { 
This expression may be rewritten in the following form; 
x*(kH|n) = £(x" (kk) ,k) + C* (K+ )H' (k+1 RT! (+t ) 
[ o(kH )aH(k+1 )£(x(k Ik) ,k) J 
+ C'(kH1)(O£' (x (k+t |k+1) +1 )/2 x) \ (+1) 
Using (2.77), this expression becomes 
x" (eet ln) = x* (cH leet) + OF (+1) (2 £1 (x (let [+t ),kH1)/d x) (KH) 


This expression satisfies the hypothesis (2.77) and noting that (2.77) 
is satisfied for k=o concludes the proof. 

Equations (2.73), (2.74) and (2.77) constitute a recursive scheme 
in which each new observation can be processed immediately to produce an 
up-to-date estimate, The matrices C” and Pp can no longer be strictly 
interpreted as covariance matrices of the distribution for x, as they 
could be if the basic system were actually described by (2.66) instead 
Of (2d) 
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A block diagram of the estimator described by equation (2.77) is 
given in Fig. 8. Again we see the model of the system appearing in a 
feedback loop. Equation (2.77) or, equivalently, the estimator of Fig. 8 
may be given a simple intuitive interpretation if we interpret the ine 


dividual terms in the following manner: 


£(x" (ket [ke1),ke1) a prediction of x(k) 

o*(k) a measure of present uncertainty 

H* (k) sensitivity of observation to deviations in x(k) 
Ro! (k) reliability of measurements 


H(k)£(x" (ket |ke1),k-1) a prediction of 2(k) 


The equation then has the interpretation that the error in the prediction 
of z(k) is weighted in proportion to present uncertainty, sensitivity of 
the observation, and reliability of the measurement. This weighted error 
is then used to modify the predicted value of x(k) to obtain an up-to-date 
estimate of x(k). 

Equations (2.73), (2.74) and (2.77) or equivalent equations ob- 
tained by the use of matrix identities are particularly suitable for real 
time implementation on a digital computer, Simulation seaaies may be 
conducted to evaluate the applicability of this approximation to a 
particular system, The results of some simulation studies using this 
linearized estimation technique are given in Chapter V, 

When a detailed analysis of all available data is to be made after 
the completion of some experiment such as a rocket flight, one is faced 
with numerical solution of the nonlinear two-point boundary value problen. 
Some method of successive approximations, such as a version of the gradient 


method of Kelley [35] and Bryson| 9], is usually called for, There is 
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danger of converging to a local minimum if the conditional probability 
density function (2.7) is not unimodal, This danger is reduced and the 
speed of convergence increased if a good first approximation is available, 
The logical choice for such a first approximation is x (k|n) which may be 
calculated by backward recursion of (2,69) and (2.76). Alternately, we 
may combine (2.69) and (2.76) to obtain the following equation for A 


which is similar to (2.56); 
Meet) = CL = B'Ck)Ro (K)B(K)O (k) ] (a £'(x*(klk),k))/a x) Ak) 
+ H'(k)R™'(k) [ 2(k) = H(k)x" (kik) ] (2.78) 
Then x*(kin) may be calculated using (2.78) and (2.76). 
2.02 Generalizations 


Having obtained some confidence that linearization techniques will 
produce equations closely resembling the equations for the linear estima. 
tion problem, there are several directions in which one could generalize, 
First, one could complicate the problem without introducing new nonlinear. 
ities by allowing R(k) and P(o) to be singular and permitting correlation 
between wand v,. That is, one could study the problem of Appendix D using 
(2,66) in place of (2.1). Second, one could introduce new nonlinearities 
by allowing h to be nonlinear and/or allowing w to enter nonlinearly as 
in (2,3). Third, one could combine these first two types of complications, 

We shall discuss each of these possibilities briefly, indicating 
how the results obtained previously would be modified in these situations, 
but omitting detailed derivations, 

The generalization to inelude singular R(k) and P(o) and allow 


correlation between v and w is completely straightforward. The derivation 
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of these results is completely analogous to section D5 of Appendix D 
and may easily be obtained by using section D5 as a guide. The basic 


equations which result from such a derivation are as follows; 
x*(klk) = x" (klket) + P™(k)H(k)B"(k) [ 2(k)-H(k)x"(klket)] (2.79) 
x"(kH|k) = £(x"(klk),k) + G(k)S(k)B'(k) [ 2(k)=H(k)x*(kik-1)] (2.80) 


x*(kl[n) = x*(klk) + P*(k) [ (2 £°(x*(kik),k)/2 x)=H! (k)M(k) 7] A(k) 


K=0 , 000 9h (2.81) 


\Mket) = [ (att(x*(klk),k)/o x) = H'(k)M (k)] (ks) 


+ H'(k)B*(k) [ 2(k)-H(k)x" (klket ) ] =1,.c08n (2.82) 


Pp (k+1) = (2 £(x*(kik),k)/ ax) P*(k) (af! (x*(klk),k)/2 x) 


+ G(k)Q(k)G"(k) = MY" (k)B*#(k)M*(k) (2,83) 
B*(k) = [ H(k)P*(k)H"(k) + R(k)] # (2.84) 
M*(k) = BY(k) [H(k)P"(k)( Of! (x"(kIk),k)/2 x) + $'(k)G'(k)] (2.85) 


A block diagram of the estimator described by equation (2.79) and 
(2.80) is given in Fig. 9. Again we have a model of the system appearing 
in a feaitek Loop. 

If new nonlinearities are introduced, then we are forced to further 
linearization, We shall first consider the case in which h is nonlinear, 
In order to acquire a feeling for this situation we shall examine the 
simplest possible example, n=o, Then, 


2 2 
I, = 21 x(o)-m Ss + 4])2(0)-h(x(o),0) | 
£ R7 


'(0) 1 (0) (2,86) 
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Fig. 9 Nonlinear discrete-time estimator when v(k) and w(k) are correlated, 
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or, equivalently, 


é 2 
1, = $1 x(0)-n| + $llz(o)-n(m,o)- [n(x(0),0)-h(@,0)] Il 
1 R 


"(o) (0) 


(2.87) 


This may be rewritten as 


2 2 2 
I, = alx(o)-all H105) * #}2(o)-b(@.0) I 1 a all (0) ,0)-B@,0) I 


(0) 
- [| h(x(o),0)-h(m,o) }! R°' (0) [z(o)-h(m,o) ] (2,88) 
The second term in this expression is independent of x. The third and 
fourth terms we expand about the a priori mean m retaining the first non- 
zero terms, The fourth term becomes 
(x(0)am)! (3 h' (m,o)/Dx) R°'(0) [ z(o)-h(m,0)] (2.89) 


The third term becomes 

$ (x(o)em)' { d[(ob'(m,0)/ax) F'(0) h(m,o)]/Ox | (x(o)-m) —(2.90) 
Let 

o7'(0) = P-l(0) + O[(ah*(m,o)/ax) R°'(o) h(m,o)] /ax (2.91) 


Then (2.86) may be approximated by the following expression 


2 2 
I, = $Ilx(o)-n | i 4 || z(0)-h(m,0) || 
Cc’ (0) R 


"(0) 


+ (x(o)-m)'(ah'(m,o)/2x) ce (co) [ 2(o)-h(m,o) ] (2.92) 
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Differentiating this expression with respect to x(o) and setting the 


result equal to zero yields 
x (olo) =m + C(o)(ah'(m,o)/ax) R°'(0) [ 2(0)-h(m,0) ] (2,93) 


Equations (2.91) and (2.93) are similar in structure to (2.72) and (2.77). 
If we interpret (dh'(m,o)/ax) as the sensitivity of the observations to 
deviations of x from the prior mean, we may give (2.93) and interpretation 
similar to that which was given in (2.77). The same basic structure is 
retained for arbitrary n, Note that h(x(k),k) is linearized about 

x" (k|Ke1) since x"(k|k) cannot be computed until z(k) is observed, The 
expression for C(k) is of the same form as the expression for C(o) given 
by (2.91). P(k+1) is computed by (2.74) as before. 

Let us again consider the case n=o, but now allow R(o) to be sing- 
ular, In this case, following Appendix D, we consider v to be the result 
of a linear transformation on a normalized random vector u. That is, 

v = Tu where TT' is equal to R(o). Introducing the Lagrange multiplier 


vector €, the expression to be minimized becomes 


2 2 
T,=elxcodenl| |  +2llall + & [2(o)-h(x(o),o)-Tu] (2.94) 
re (0) 


This expression will be approximated by the following linearized expression; 


2 2 
I= $llx(o)-n | ,. t zlall +&'{ 2(0)-h(m,0)-(0 h(m,o)/2 x) 
P™' (0) 
[x(o)-m] - Iu] (2.95) 


Setting the total differential, dI-(x(o),u,2), equal to zero yields the 
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following equations; 


x"(olo) =m + P(o)(a h'(m,0)/a x) @ (2.96) 
z(o) = h(m,o) + (eh(m,o)/ax) [ x (olo)=m + Tu (2.97) 
ne T*e (2.98) 


Combining these three equations, we obtain 
2(o)-h(m,o) = [ (dh(m,o)/dx) P(o) (@h'(m,o)/a x) + R(o)] @(0) (2.99) 


Using the generalized inverse to solve for a@(o) and combining (2.96) 
and (2.99) yields 


x(o) =m + P(o) (dh'(m,o)/ax) [(ah(,o)/2x) Plo) 
(ah'(m,o)/ax) +R(o)J* [2(0)-h(m,0)] (2.100) 


Note that (2.93) and (2.100) are not the same, nor can they be 
shown to be equal using matrix identities even if R”'(o) exists, The 


reason for this situation is that if RT does not exist we are forced to 


linearize earlier than if RB” exists. The procedure we have used for 
singular R is equivalent to approximating the second term in (2.88) by 
the following expression: 
2 
2 lz(o)-h(m,o)=(dh(m,0)/Ox) [x(o)-n]ll _, 
R- (0) 
Here the approximation takes place before the multiplication; whereas, 
previously we were able to multiply and then approximate the resulting 


expression, 


The basic structure of the situations which arise when h is non- 
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linear is well illustrated by the simple case n=o, The required modifica. 
tions of our previous results are evident in (2.91), (2.93) and (2.100), 
Suppose that w entered the system nonlinearly. We would then 


approximate 
x(k+1) = £(x(k), wk), k) (253) 
by 
x(k#1) ~ £(x"(klk),0,k) + (© £(x"(kik),0,k)/3x) 
[x(k)ex"(kik)] + (2 £(x"(klk),0,k)/a w) w(k) (2.101) 


It is clear that (2 £(x"(kIk),o,k)/@w) will play the role of G(k) in 
our previous work. 

The reader should have no difficulty in combining these results 
to meet the needs of any particular problem, How well these approximation 
technigues will work will, of course, depend on how well the system is 
approximated, the frequency of measurements, noise levels and other 
factors peculiar to the individual problem. Similar linearization schemes 
have been developed from another point of view in recent independent 


studies by other investigators [37, 59] who give encouraging results, 
2.8 Linearity in Disguise 


In certain special cases the approximation of (2.1) given by (2.66) 
becomes exact and equations given by the linearization technique are 


strictly correct, We rewrite (2.1) and (2.66) for convenience, 
x(k+1) = £(x(k),k) + G(k)w(k) Ca) 


x(k+1) = £(x" (k Ik) k) + (O£(x*(klk),k)/o x) [ x(k)=x” (k|k)] + G(k)w(k) 


(2.66) 
~60e 





The first case is the trivial one in which f(x(k),k) is linear, That is, 


when (2.1) takes the following form; 
x(k+1) = F(k)x(k) + G(k)w(k) Ca) 


The second case arises when all state variables are exactly measurable 
and x"(k|k) is equal to x(k). 

Various combinations of these two cases may arise in which it is 
not so obvious that the underlying estimation problem is linear even though 
the basic equations are nonlinear, Consider the case in which the state 
vector x may be decomposed into an exactly observable part x, and a ree 
maining part xX». The underlying estimation problem will be linear if the 


basic system is of the following form; 


X, (k+1) o Fo (k) x, (k) Go(k) 
(2,102) 
2(k) = x, (k) (2.103) 


In this case z may be substituted in the equation for x, to produce the 


following relation; 
Z(kH) = Fy, (a(k),k) 2(k) = By (a(k),k) X5(k) + G) (k) wk) (2.104) 


The relation shows that a difference of known quantities is equal to a 
linear combination of the unknown part of the state vector plus additive 


Gaussian noise, This, together with the following linear equation for X,, 
Xp (kH ) = Foo(k) Xo(k) + G(k) w(k) (2.105) 


Shows that the estimation problem is linear and the distribution for X> 


67 





remains Gaussian even though the basic system (2.102) is nonlinear. 
For example, consider the following simple system with a random 


parameter Xo; 


x, (k+t) = [ a + x(k) | x, (k) + wy (k) 
b Xo (Kk) + W (k) 


z(k) = x(k) + v(k) 


Xo (k+1 ) 


If the additive noise v is not identically zero the estimation problem 
is nonlinear and the use of the linearization technique would give only 
an approximate solution to the estimation problem, If v is identically 
zero this is a special case of (2,102), Then the estimation problem 
is linear and the linearization technique gives the exact solution to 
the estimation problem, We see from this simple example that the validity 
of the approximation may decrease as noise levels increase, This is in 
addition to the expected decrease in the performance of even the optimal 
estimator when noise levels increase, 

As a second example, consider the following system in which the 


transition matrix F(k) is random: 
x(k+1) = F(k)x(k) + G, (k)w(k) (2 Oop 
mike ak) (2,107) 


The elements of the transition matrix F(k) are considered to be the out- 
puts of a linear system excited by an independent noise sequence, In 
order to see how such a situation would be handled we shall examine the 


following second order system; 
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x, (k+1 ) i f, (k) £,5(k) x, (k) z G, (k)w(k) 


Xp (k+1 ) £,, («) £5 (k) x(k) 
z, (k) = x, (Kk) 25 (k) = x, (k) 


Let us augment the state-space by adjoining the random parameters to 


the state vector as follows; 


fy, (k+1 ) Xo (k+1 ) X3(k) 
f, ,(k+1 ) (k+1 ) (k) 
: - |* = ace Peg edt) 
Wi, (itt ) x, (K+ ) x(k) 
£5 (k+) x6 (KH ) X¢(k) 


Then the equation for the augmented system may be written in the form of 


(2.102) as follows; 


x, (K+ ) Pai) beeen) eG On | ican oe 
Mee iL ae ee ee eee 

(+1 ) | x, (k) 
ae | 9 A(k) ig al ae | 
x. (K+ ) . x_(k) 
x6 (+1) ! x, (k) 


The basic technique used for the second order example is readily 
extended to the nth order case, | 

The fact that a nonlinear estimation problem may become linear 
if certain noise levels go to zero can be a useful tool in investigating 
nonlinear estimation problems, If we observe that such is the case for 
@ particular estimation problem, we immediately acquire an upper bound 


on the performance that can be obtained, In addition, we may gain 
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insight into the coupling between various noises and the quantities to 
be estimated, For example, in (2.104) we see that the random input w 
is playing the role of additive measurement noise in the problem of 
estimating xX, 

Lest we become complacent, consider what would happen if for 
(2.102) we were able to observe only xX». It is evident that we would 
acquire no new information about x,. This difficulty raises the inter- 


esting question of observability which will be discussed in Chapter IV. 
2.0 Successive Linearization 


If one is interested in a problem of ex post facto data analysis 
the job of obtaining a numerical solution of the two-point boundary value 
problem arises, AS was mentioned earlier, a logical first approximation 
to the solution would be x (kIn). One method that might be considered 
is that of successive linearization. This method is recommended on the 
grounds that it is comparatively simple and that rapid convergence can 
be expected, Unfortunately, convergence cannot be guaranteed. This is 
the price to be paid for simplicity. In a problem where there is no re- 
quirement for real time solution it may be worthwhile to try this method 
first in hopes of simply obtaining a fast goa eien while running a slight 
risk of non-convergence, The results of some encouraging applications 
of this technique are given in Chapter V. 

The idea behind the method of successive linearization is this. 
The estimate x (kHi|k#i) is generated by linearizing about the point 
x (klk). After the entire observation sequence { 2(0),0«+,2(n) § has 
occurred, the smoothed sequence | x (kIn)}, being based on more measurements, 


would hopefully be a better trajectory about which to linearize than 


e706 





fx" (k|k)j was, One can then use a modified version of (2.77) obtained 

by linearizing about the sequence { x" (kin)/ to reprocess the observations 
and obtain a sequence | x, (kik)} . This sequence should be better than 
fx"(klk, Since the linearization took place along a better trajectory. 
This sequence is then smoothed using modified versions of (2.76) and (2.78) 
to obtain a second approximation to the solution (x, (kin). The procedure 
is then repeated linearizing about { x,(kin)} . 

In the development of the equations for this technique we shall 
designate the sequence i x*(kin)§ by { X, (kin) { to emphasize that it is the 
first approximation to the solution. Suppose that we have obtained the 
ith approximation sequence x, (kIn) and we seek approximation iti. Con- 


sider the approximation to (2.1) given by the following linearization; 
x(k+1) = £(x, (kIn),k) + (2 £(x, (kIn),k)/ox) [x(k)-x, (kIn) ] 
+ G(k)w(k) (2.108) 


If this expression is substituted into (2.9) a modified two-point boundary 


value problem is obtained; 
X, 4, (eHIn) = £04, (kin),k) + (O£L(x; (kIn),k)/O x) [ x, 4, (kIn)ox, (kin) J 
+ G(k)Q(k)G* (k)A(k) (2,109) 


Meet) = (B£'(K (kln),k)/3 x) Ak) + HN CKIRT' Ck) [ 2 (cH), 4 (eI n)] 


@.110) 
The boundary conditions are 
A(m) = 0 (22ap 
X,4,(oln) =m + P(o)H' (o)R™' (0) [ 2(0)-H(o)x, ,, (oln) J 
+ P(o) (af'(x, (oln),0)/2x) A(o) (2.112) 
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For simplicity we first assume that the observations are linear 
and that R(k) is positive definite. From (2.109) we see that if it con- 
verges, the successive linearization technique converges to a solution 
of the nonlinear two-point boundary value problem. 

The boundary condition (2.112) may be rewritten in the following 


form: 
Xa (oln) = x, 4, (olo) + &,(o)(a£! (x, (oln),0)/2 x) A(o) (25113) 
where Xe ay (olo) and Cc. (o) are (0 lo) and C(o) respectively as given by 


(2.39) and (2.40). 


As usual, we assume a solution of the following form; 


X44 (ln) = x4, (klk) + © ()(O£' (x, (ln), k)/2 x) Ac) (2.114) 


Substituting (2.114) into (2.109) yields 


X, 4, (etn) = x, | (ett k) + B, (eH) \(k) 2.115) 


where x. .,(k#1|k) and P. (k+1) are defined as follows; 
=| <A) 
Xie (eo |k) = £(x, (kln),k) + (2 £(x, (k|In),k)/2 x) 


[ x. 4 (kik) = x, (kin) ] (2.116) 


Pi (tt) = (a £0, (kln),k)/O x) O(k) (o £1 (x, (kIn),k)/2 x) 


+ G(k)Q(k)G" (k) (2.117) 


Combining (2.115) and (2.110) yields 


a2 





X44 (KH |] n) = x, ,, (kt1|k) + By (let JH! (+1 yr” (k+1) [ 2(k+1) ~ H(k+ x, ,, (k+1/n) | 
+ B, (k+1) (0 £'(&, (eH |n) ,k+1 )/ox) d\(k+1) (2.118) 


Solving (2.118) for Xe 4 (k+1|n) we obtain the following relation which 


satisfies hypothesis (2.114): 


Xa (k+i|n) = X. 4 (k+1 |k+1) + C. (k+1) (o£' (x, (+1 In),k+#1)/O x) A(k+1) 


(2.119) 
where 
4 (k#1|k#1) = Xe 4 (ktilk) + c. (k+1 )H! (k+1 se. (k+1 ) 
[ Z(H) = H(kH )x, ,, (kH| k) J (2.120) 
and 
= oc =1 
G, (k+1) ={I+ P. (leH JH! (cH! Ro" (e+ JH (e+ ) le EN Cc#1)) (2.121) 


We have now derived a familiar looking set of equations. The 
sequence eo (klk) § may be calculated using (2.116), (2.117), (2.120) 
and (2.121). The i+! approximation to the solution { Xe a (kn) § may be 
obtained using (2.120) and (2.110). 

These results may be put in an alternate form; combining (2.110), 
(2.114) and (2,120) yields 


Meet) = [2 - B8Ck)R' (c)H0c)G, (k)] (e £1 (x, (kln),k)/ 2x) dC) 


= = 
# [I - HNC)” (k)H(K)G, (k) ] BN C)RT! (ke) [ 2 (cdo ()xe, 44 (k [ket J 
2Hi22) 
By repeated use of matrix identities this equation may be written in the 


following form corresponding to (2.57); 


of jo 





Meet) =f T= HY(k) [ HO), (E(k) + B(c)] ~! BCc)B(K) Ff 
(2 £1(x, (kIn),k)/2 x) Q(k) + HY(k) [ HCc)P, ()B" Ck) + R(k)] ~ 
[ 2 (kk) = H(k)x(kl kt )] (2.123) 
Combining (2.116) and (2,120) and using the matrix identity (A.2) yields 
Xi 44 (kt 1k) = £(x, (k[n),k) + (2 £(% (kIn),k)/Ox) [X.4, (kl ket) - x, (kln) | 
+ (8 £(x, (kln),k)/ x) By (k) HY(k) [H(k)P(K)H! (k) + R(k)] 7 
[ 2(k) = H(k)x, 44 (kl ket) J (2.124) 


We may also write the following equation for P, (k) by combining (2.117) 
and (2,121) and using (2.58): 


Py) = (O£(K, (kIn),k)/Ox)} By (k) = Bi (e)H"(k) [ H(k)B(&)H"(k) + R(K) J ” 
H(K)P(K)} (2 £1 (x, (In), )/ Ox) + G0) Q(K)G" (ke) (2.125) 


We may summarize the method of obtaining the iti approximation 
sequence ix, + (k\n) § from the previous approximation sequence § x, (kn) $ 
in the following three-step Soca! 

1. Calculate the sequences ix ie (klk-1)} and P, (k) for 

kK=0,000,n using (2.124) and (2.125) respectively. 

The procedure begins by setting x.4,(ol-1) and P. (0) 
respectively equal to m and P(o) of the a priori dis- 
tribution for x(o). 

2, Calculate the sequence} \(k)§ for k= =1,.00,n by back- 

ward recursion of (2.123) using the boundary condition 


Mn) = 0. 


=D 





3, Calculate the i+! approximation sequence { Xs + (kl n) § 


for K=0 5000, using C2 allt on 


The same technique may be used when R(k) is singular if the inverses 
appearing in (2.123), (2.124) and (2.125) are replaced by generalized in- 
verses , 

The method may easily be extended to include the complications of 
nonlinear observations, correlation between the random input w and the 


measurement noise v, amd the random input w entering nonlinearly. 
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CHAPTER III 


ESTIMATION OF STATE VARIABLES FOR CONTINUOUS-TIME SYSTEMS 


ail Introduction 


The mathematical model for the continuous-time estimation problem 
was discussed in some detail in Chapter I. The explicit problem state. 
ment was given in section 1.41. 

The continuous-time estimation problem is analogous in many respects 
to the discrete-time problem of Chapter II. In the continuous-time prob 
lem we work with probability density functionals instead of probability 


density functions, Specifically, we consider 


Px 2 ‘Efto, t+ all 2 to.t] | 


the probability density functional for the time segment of x(t) on the 
interval [t., t + T] given the observation z(t) on the interval [t,,t]. 
An introductory discussion of probability density functionals is 
given in Appendix E, For the purpose of this chapter it will be suffi- 
cient to note that the summations appearing in the discrete-time problem 


become integrals in the continuous-time problem, 
o< Preliminary Considerations 


We asSign a Gaussian a priori distribution for the initial state 
x(t.) with mean eet) and covariance matrix P(t). It will be con- 
venient to assume that P(t,) is positive definite, although this restric. 
tions may be relaxed, 

We shall use t to designate the present time, t, to designate 


the time observations begin, and 7 as a general time parameter, The 
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estimate of x(7) given 2 ft t} is denoted by x(7 Goh 

In order to estimate x ito» t when G(t)Q(t)G'(t) is non-singular 
we may minimize with respect to a (tee t] the following functional which 
is the continuous-time analogue of (2.8); 


Teer ena a 
I(t) = $() x(t, )-X(t,| t,) I " + 4 d {| 2(7)2h(x(7),7) pa 


o to 


(7) 


° Z 
+ || x(7)-£(x(7),7) || Ed } dT (3.1) 
[ 9G "] 
Equivalently, we may consider minimizing the following functional; 


2 
I(t) =4 x(t j-X(t_|t 
) x(t, )=x(t,] t,) I nt (t.) 


tf {i acr-ncen I WuCr) | ar 
+> ZAT Jo T 7 = 5 w(7 
t | Ro?) ote) 


3 (302) 
with respect to Hit t] and ite t] Subject to the constraint 
Oo? 9 
x(7) = £(x(7),7) + G(7)w(7) (3.3) 


This functional, (3.2) subject to the constraint (3.3), may also be min- 
imized when G(t)Q(t)G'(t) is singular, 


If we wish to estimate X[t5, t + 7] (3.2) becomes 
o 


aO72 





2 
I(t,t +7) =4|[x(t_)ox(t,] t) || _ 
( ) = = I x(t,)-x(t,| a 


f tiemaemn I Ie) 
+3 2(7)=h(x(7),7 +l| w(t) |. [ a7 
t ~aao" R(t) g(t) 


O 
t+T 2 

+3 fo tw) at (3.4) 

t Q (7) 

nat is, 
t+T 2 
(tt+n=1t) +e fo fwd 4 ar 

t Ce (Ge, Ocsy 


Fron (3.5) it is clear that I(t,t + T) may be minimized by first min- 
imizing I(t) and then setting w(7) equal to zero fort £7 St+T. 
This, together with the constraint (3.3), shows that predictions are 
again extrapolations of present estimates, We may, therefore, confine 


our attention to estimating x [to,t]° 


g Formulation 





Note that because of the constraint (3.3), minimizing (3.2) with 
respect to w and x is equivalent to first minimizing with 
>. [to st] = [tot] . : 
respect to w +} and then minimizing with respect to x(t). This leads 
O°? 


us to define the following "cost" function: 


Zz 
Vex(t)t) = Min 4 $ x(t Rte] ty) bat 


H ito. t} t) 


O 


t : 
af [decrace.o a +hwc by | at 
ts i a" 606 
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subject to the constraint (3.3). 

Note that minimizing with respect to Mito, t) leaves a boundary 
condition on x to be specified. We satisfy this boundary condition by 
specifying the final point x(t). This constitutes a departure from most 
dynamic programming problems in which the initial condition x(t) is 
specified. Our procedure here is similar to that used in case i of 
section 2.5, in which f-! existed, No similar assumption need be made 
here because integration of a differential equation may proceed in either 
direction, whereas there is a definite direction associated with a differ. 
ence equation, 

We may give V(x(t),t) an interpretation similar to the one given 
to Vv (x@)) in the discrete-time problem, Let c be any particular value 
of x(t). Then V(c,t) is a measure of the unlikeliness of the most probable 


state trajectory x in which x(t) takes on the particular value c, 


[tot 


given the observation z and the a priori distribution for x(t). 


[toot] 
The estimate of x(t) is that value of x(t) for which V(x(t),t) is minimum, 
Using the separability property we may rewrite (3.6) in the follow. 


ing form: 


c 2 
V(x(t),t) = Min V(x(t)\2axX,t=- at ) + 3 7 [ a(T)-h(x(7),7) | 
tare R 


*E- 4 t,t] ° 


2 
+ |lwr) ||, | d 7 
Q (7) 
Subject to the constraint (3.3). Expanding V(x(t),t) in a Taylor series, 


letting »t approach zero and using the constraint (3.3), we obtain the 


following equation: 


7 Os 


(7) 





2 
Vz = Min (= vi [ £(x(t),t) + G(t)w(t)] + 3 || a(t )-h(x(t),t) | _, 
w(t) oe R 


+ $l w(t) | ae 
Q (t) (3.7) 
The procedure used to obtain (3.7) is standard in the dynamic 
programming literature, For more detailed discussions of this procedure 
and its relation to the calculus of variations see [3, 5, 15, 16]. 
The w(t) which minimizes (3.7) is easily found by differentiation 
to be 


w(t) = Q(t)a' (tv, | (3.8) 


Substituting this expression for w(t) into (3.7) yields the following 
important relation: 


Z 
end My ae, = VL AGE the Sea Ie : 


(3.9) 
At this point our problem is twofold: 


i) Find a function V(x(t),t) which satisfies (3.9), 
ii) Find X(t|t),, the value of x(t) for which V(x(t),t) is minimum, 
A boundary condition for (3.9) is given by the a priori distribution for 


the initial state x(t), since for t = t., (3.6) becomes 


| Z 
V(x(t,).t,) = $ Ix(t,)-x(t.1t,) Il elie) (3.10) 
= O 7 


Once V(x(t),t) has been found, we may obtain x(7|t) for t, ere 


by backward integration of the following equation formed by combining 


=80= 








(e353) and) (3.8); 
axit)/ar = £(&(7It),7) + G(7)Q(7)G" (T)V,, (3.11) 


using x(t |t) as the boundary condition. 
There is no difficulty in including in the dynamic programming 
formulation systems in which the random input w enters nonlinearly. For, 


example, if the basic system is described by the following equation: 
x(t) = £(x(t),t) + G(x(t),t)w(t) (3.12) 
then (3.8) becomes 


w(t) = Q(t)G" x(t), t)V, (3613) 


and (3.9) becomes 


Ve =F Vy (x(t), t)Q(t)ar (x(t), tv, - Vy, £(x(t),t) 


2 
+ 2 ia(t)-n(x(t),t)il 
Rv’ (t) (3.14) 


For well defined systems described by the following equation; 
x = £(x(t),w(t),t) (3.15) 
(3.7) becomes 


V. = Mi -V. £(x(t),w(t),t t)ah(x(t),t 
: ns, galt) ule)st) + $ z(t )enGa(t),t) I ae 


+ $ || w(t) ' 
Gace t) | (3.16) 


=o81— 





and (3.8) becomes 
w(t) = Q(t)(o £1 (x(t), w(t),t)/a w)V, (3317) 


Equation (3.16) is equivalent to two equations: (3.17) and the 


following equation: 


: é 
Ve =~ V, L(t) w(t),t) + 2] 2(t)-p(t),t) | a(t) 


2 
+$llw(t) I _, 

Q» (t) (3.18) 
We shall later relate (3.15) through (3.18) to the two-point 


boundary value problem obtained by Bryson and Frazier [10]. 


3.4 Solution to the Linear Problem 


The solution of the linear filtering problem discussed in section 
1.44 was first obtained by Kalman and Bucy [34]. In this section we give 
a Simplified derivation of this solution based on the dynamic programming 
formulation of the problem. In order to simplify the resulting equations 
it will be convenient to omit the parameter t when no confusion is likely 
to result, 


When the basic system is linear (3.9) becomes 


' ; : 
V, == V, G(HQ(HIG'(t) Vi = Vi E(t)e(t) + £ll a(t)-H(t x(t) i) a 


(3.19) 
It is well known [44, 45] that an analytic solution may be obtained for 
equations of this type by expressing V(x(t),t) in a series of the follow- 
ing form; 


~Bb2— 





V(x(t),t) = a(t) + b'(t)x(t) + S x'(t)0(t)x(t) 


For our purpose it will be more convenient to write this expression in 
the following symmetric form in which the minimum of the function is 


shown explicitly; 


2 
V(x(t),t) = $llx(t)-x(t]t) || 5 © + k(t) (3.20) 
P(t) 


In order to determine a (tle), P we and k(t) we substitute Ve 


and V, obtained from (3.20) into (3.19) and equate terms of like degree 


in x(t). 


Differentiating (3.20) we obtain the following equations; 


ees Bl (t) [ x(t)-K(tIt)) (3.21) 
a | x =| a, 2a 1 A lawl oe 
vy, = bat b'x- [Reet ee BT] xe ager eR + w/at (3.22) 


Substituting (3.21) and (3.22) into (3.19) and equating terms of 
like degree in x, we obtain the following equations after some algebraic 


manipulations 3 


F-EP + HIRE 


= - Plaga! pl. prt 


@ 
Iu, 


(3523) 
dX(t|t)/at = F(t)x(t|t) + P(tH'(t)R-'(t) [ 2(t)-H(t)k(t]t)] (3.24) 
Using the identity P = - P P”'P we may write (3.23) in the following form; 
P(t) = F(t)P(t) + P(t)F'(t) — P(b)H'(t)R™! (t)H(t)P(t) 
+ G(t)Q(t)G! (t) (225) 


The final form of the optimal linear filter is given by (3.24) 
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Fig. 10 Optimal linear estimator 
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and (3.25) which are identical to the equations obtained by Kalman and 
Bucy. A block diagram of the optimal estimator is given in Fig. 10. Here 
we again find the intuitively appealing feature of a model of the system 
appearing in a feedback loop. 

As in the discrete-time case, P(t) is the covariance matrix of the 
probability distribution for x(t), given the observations, Equations (3.24) 
and (3.25) remain valid when P(t) is singular as can be shown by taking 
the limit of the discrete-time solution. If P(t.) is singular the dynamic 
programming formulation breaks down because trajectories passing through 
points which are inconsistent with prior information become infinitely 
more unlikely than trajectories which do not, and we are unable to reflect 
this situation in a scalar "cost" function. This is not a serious drawback 
since the a priori distribution may be chosen to approximate the state of 
prior knowledge aot closely without introducing a Singular coe 
variance matrix, As in the discrete-time case, this difficulty is largely 


bypassed if GQG' is non-singular, since then P(t, +A t) is positive def- 


inite, 





Approximate Solution to the Nonlinear Problem 


In order to develop an approximate solution to the nonlinear prob- 
lem we follow a procedure similar to the one used in discrete-time. We 
may rewrite (3.9) in the following form: 

1 : 8 : a 
Vy =e zig (tv, ct) - V_ £(x(t),t) +3 lla(t)-n(x (tlt).t) 


2 
-  b(x(t),t)=-b@ (tlt),t) | 
| | R™' (t) (3.26) 


sees 


a eee 





Since the estimate of x(t) is the value of x(t) for which V(x(t),t) 
is minimum, we would expect that V(x(t),t) could be closely approximated 
in a neighborhood of this minimum by a quadratic form in x. Such a 
quadratic form has already been shown to be the solution in the linear 
case. We Shall use ett) to designate the estimate produced by this 
approximation. The terms on the right hand side of (3.26) involving f 
and h will be expanded in a Taylor series about the point x(t). The 
argument t should be understood but will usually not be written. All 
partial derivatives are to be evaluated at the point (x"(t|t),t). 


The approximation for f is simply 
£(x(t),t) = £(x*) + (of/ax)[ xx" ] (3.27) 


The term involving h may be broken down into the following sum of three 


terms: 


Z 2 
* = 
$z-n(x") ||, + SU pG)en(x") || = [bt (adapt )) 8! [ 2-h(x")] 
Rr RoI 
The first of these terms is independent of x. The second term is approx- 


imated as follows; 


é t x 
+ Ia @-n(x") | win 2 [xx] { af(on'/ox)e'h) /a x} [xx") 
7 (3.28) 


The approximation for the third term is 
~ [h'(x)eh'(x")] BU! [zeh(x")] 2 = [xex"]' (On'/ox) BT [ 2-b(x*)) 
(3.29) 


If these approximations are substituted into (3.26) the following equation 


results: 


B60 





Z 


ei 


to} 


~ V. f £") + (af/dx) [xx] | 


| StV,. II 
ome Q ae 


é * * 
+ elena’) I, telex)! [ e[(on/oxE 'n} /ox} [xx] 


* i: * 

~ [xx]! (ob'/ax)R [2-h@’)) (3.30) 
An analytic solution of (3.30) may be obtained by assuming that V(x(t),t) 
is quadratic in x. Specifically, 


az 
Vix(t),t) = $ llx(t)-x (tlt) I et ie + k(t) GES) 


It is straightforward, but algebraically involved, to substitute 
into (3,30) the V.. and V, obtained from (3.31) to obtain the following 
equations which describe the estimator; 


dx (t[t)/dt = £(x (tlt),t) + P*(t)(a h'(x" (tlt), t)/Ox)R” (t) 
[z(tjoh(x (t|t),t)] (3.32) 
P*(t) = (o£/ax)P*(t) + P (t)(aft/ox) + G(t)Q(t)a! (t) 
~ P(t) { af(ont/ox)r'n] /ox} P(t) (3.33) 


The approximate solution to the problem of estimating the state 
variables of a nonlinear system in real time is given by (3.32) and 
(3.33). For linear systems these equations reduce to those obtained by 
Kalman and Bucy. The initial conditions for P (t) and x (t|t) are the 
P(t.) and x(t,|t,) given by the a priori distribution for the initial 
state x(t,). 


A block diagram of the nonlinear estimator in a control system 
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is given in Fig. 11. In the diagram the control vector u is shown ex- 
plicitly, Again we find the intuitively appealing design which places 
a model of the basic system in a feedback loop. 

If the random input w enters the system nonlinearly we may retain 
the basic structure of the estimator if we replace G(t) by G(x (tit), t) 
or (3 £(x" (t|t) Ob) /Sw) -- whichever is appropriate. 

A first approximation to the solution of the problem of estimating 
x(7) for t, £7 € t is given by x" (7|t) which can be obtained by backward 
integration of the following equation using x (tlt) as the boundary 
condition: 


ax (ilt)/or = £(x"@lt),7) + a(ra(nar (Mv, 
a (3.34) 





The two-point boundary value problem obtained by Bryson and 
Frazier [10] using the calculus of variations follows directly from the 
dynamic programming point of view. We shall consider the most general 
case in which the noise enters nonlinearly as described by (3.15) through 
(3.18). The procedure we follow is based on the one used by Dreyfus [16] 
in his study of the problem of Mayer in the calculus of variations. 

In order to simplify the resulting expressions the arguments x, 

w and t usually will be omitted when no confusion is likely to result, 
The following relation follows from the rules of differentiation; 


dv /at = OV /at + (av./ax) x 
= 2 x (3.35) 


Differentiating (3.18) with respect to x yields 


Bpos 





2V,/ ot = -(aft/a x, - (2V,/ dx) 
~(an'/ax) B [ gh(x,t)] 
~(aut/ax) [ (2f!/2u) v= Ow) (3.36) 


Noting that by (3.17) the coefficient of (ow! /Ox) in (3536) is zero, 


we may combine (3.35) and (3.36) to obtain the following equation; 


av,/at = = (aft/ax)V, - (@h'/ax) RB! [z-h(x,t)] (3.37) 


This equation, together with (3.15) and (3.17), are the basic relations 
for the two-point boundary value problem, At the final time t we have 
the following natural boundary condition which corresponds to the 
requirement that Cols) is the value of x(t) for which V(x(t),t) is 


minimum: 
ee = 0 (3.38) 


The boundary condition at the initial time t, is obtained by differentia- 


ting (3.10). This yields 
A A 
x(t,|t) = X(tolt.) + E(t Wy (3.39) 


The quantity Ne plays the role of the Lagrange multiplier of the 
calculus of variations [15, 16]. If we let V = A(t), then the two-point 
boundary value problem may be written in the following form which closely 


resembles that obtained by Bryson and Frazier; 


IS 
~z 
IN 


ax(rit)/or = £(&(71t),w(7),7) t 7 (3.40) 


IN 
a 
IN 


wt) = Q(7)(af'/aw) d(7) te t (3.41) 


=~90— 





Me) == (aft/ox) Mr) = (@h'/2 x) Ro (7) [ 27) = b&CTIt), 7] 


ea Se (3.42) 

The boundary conditions are 
At) = 2 (3.43) 
x(t It) 2 x(t lt) + P(t) Mt) (3.44) 


For the important special cases in which the random input w enters 


linearly, the equations for X(7|t) and (7) become 
ax(tlt)/op = £(x(7It),7) + G(r)Q(r)G"(7) A(7) (3.45) 


M7) = = (2 £1/2 x) Mr) - (@h'/2 x) Fe 7) | 2) CGheG pe 





(3.46) 
We shall have more to say about the two-point boundary value 
problem later when we discuss the relationship between the estimation 
problem and control problems. 
Continuous-Time Systems Sampled _Observa 
In certain applications observations of a continuous-time system 
can conveniently be made at discrete points in time, We shall discuss 
this situation only briefly, since it requires only minor modifications 
of our previous results, 
Consider the system 
x(t) = £(x(t),t) + G(t)w(t) (3 47) 


Suppose the observations are made at equally spaced intervals of length 
T. The observed signal is 


=o1e 





2(kT) = h(x(kT),kT) + v(kT) (3.48) 


We again assume that the random input w(t) and the measurement noise v(kT) 
are Gaussian with zero mean. The covariance matrices for v and w are as 


follows; 


E [ w(t)w' (7) | 
E [ v(kT)v' (jT)] 
EB [ w(t)v' (kT) ] 


Q(t) JS(t-7) 


RT) Sy, 


0 


If we are between observations n and n+#1 (nT <= t < (n+1) T), then by 


4 


analogy to previous work, we may minimize the following functional; 


| 2 ’ 
I(t) = $ ]x(o)-x(0lom + 
(t) | x(0)-x(0 lo ted oer 


2 ilw(?) i dy 
Q (7) 
n 2 
+ >) $l Get) Ge(et), kT) | oy nt 4% < (nH1)T 
k=o R ) 
(3.49) 
subject to the constraint (3.47). 

Here we have assumed for convenience that observations begin at 
t=o, Note that because of the sampled nature of the observations, the 
estimate will be discontinuous at the sampling times, We use x(olo”) to 
designate the mean of the a priori distribution for the initial state, 
This corresponds to m of the discrete-time problem, Similarly, x(olo*) 
would correspond to x(o|0) of the discrete-time problem, 

Using a standard engineering technique, we let i(t) be a train 
of unit impulses with the impulses occurring at the sampling times kT. 
Then (3.49) may be eee in the following form which closely resembles 
(342): 





Zz t Z 
I(t) = $1) x(o)~k(o[on) J + fo { serrizcrrncer),7 | 
P ° R 


"(0) (7) 


2 
+ Il wr) Il j acy 
QQ @) (3.50) 


subject to the constrant (3.47). Proceeding formally 


’ ' Z 
Ve = = $V, GOG'V,-V_  £(x(t),t) +3 i(t) ll z(t)-h(x(t),t)| att) 


(3.51) 
Note that Ne is discontinuous at the sampling times so that the limit 


used to obtain (3.51) by analogy to (3.7) does not exist at the sampling 
instants, Proceeding in a purely formal manner, we now use the same 
approximation technique that was employed in section 3.5 to obtain the 


following equations: 


dx" (tlt)/at = f(x (tlt),t) + a(t) P*(t*) (ah'/ox)R”! [ 2(t)-h(x* (tit), t)] 
(3.52) 


° *.. | 


* 
me =p '(eg/ax) - (a£t/ax) P' ~ p*= gegt Pp! 


+ a(t) {a[(ab'/ox) Bh) /ax] (3.53) 


Ms © <_ 
At the sampling instants the identity P en PP ! P breaks down, 


Between the sampling times we have 


© 
P(t) = (of/ax) P(t) + P(t) (af'/ax) + G(t)g(t)at(t) 

for kT < t < (kHi)T (3.54) 
At the sampling instants we have the following formula resembling the 


results of the discrete-time problem: 


P(kt*) = [2 + p(k) { a[(on'/Ox) RB" ny /ax}|"' Por) (3.55) 
Note that the values of P'(kT*) and (e h/ Ox) are only 


needed at the sampling times, The estimator could, therefore 
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consist of an analogue model of the basic system in conjunction with a 
special purpose digital computer, Such 4 hybrid scheme would simultaneously 
avoid the difficulties of approximating the nonlinear system on a digital 
computer and performing a large number of multiplications on an analogue 


computer, 


28 Relation Between the Estimation Problem and Control Problems 





The fact that the solution of the linear filtering problem 
is formally the same as the solution of the problem of controlling 
a linear system with quadratic performance criteria was first pointed 
out by Kalman [ 31] and the existence of this duality is by now well 
known, Since this duality has received ample coverage in the literature 
(25, 29, 31, 32, 33, 34] we shall not dwell on this point except to 
mention that the nature of this duality is made especially clear by the 
present formulation of the estimation problem in which the estimation 
procedure involves the minimization of a quadratic "cost" functional. 
The close connection between the observability of a system and 
the controllability of the adjoint system will be discussed in Chapter IV. 
It should be clear from the present formulation that there is a 
close connection between the nonlinear estimation problem and certain 


nonlinear control problems, For example, consider the following system: 


x(t) = £(x(t),t) + G(t)a(t) 
y(t) = h(x(t),t) 


Suppose that we want the output y(t) to follow a known function z(t) 


and agree to choose the control u ] to minimize the following per- 


coe 
formance functional; 
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T 2 Z 
4 e d 
bf [acd - necerert iy + deed ae t 


0 


This functional is a weighted combination of a quadratic measure of the 
error and the total "energy" expended, 

Various methods may be used to attack this problem. We shall 
find it convenient to use the Hamiltonian point of view, so much in vogue 
in the current control literature, Since this section is an aside, we 
shall assume that the reader is familiar with the basic concepts, For 
extensive treatment of these and related matters, the reader is referred 


to i, a Lor TO, BOS 252+ 61]. 
Let us define the following Lagrangian: 


'@) 


2 
L(x(t),u(t),t) = $l|z(t) - h(x(t),t) | 
R° n't) 


2 
+ || u(t) || 
Q 


Using the "Maximum Principle", we consider the following Hamiltonian; 


H(x(t),p(t),t) = Ee p'(t) [| £(x(t),t) + G(t)u(t)] - er 
u 


The maximizing value of u(t) is easily obtained by differentiation 
xk 
u (t) = Q(t)G"(t)p(t) 
Substituting this value into the expression for H yields 


2 2 
H(x(t),p(t),t) = pi (t)£(x(t),t) + lionel, ~ $|)2(t)-h(x(t),t) || Ae 


The canonical equations for this problem, 


x(t) = 3H/ap 


p(t) = -dH/ax 
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are the same as (3.45) and (3.46) if we equate p(t) and \(t). Further, 
we can easily recognize (3.9) as the Hamilton-Jacobi partial differential 


equation [15, 16, 30}, 
Vy, + H(x(t) Vist) =70 


We can make the boundary conditions the same as those for the estimation 
problem by leaving the final state x(T) unspecified and by assigning a 


quadratic "cost" || x(t_) = x(t ) 1° for starting in a state other 


than a nominal initial state Tee). Specifying the initial state would 
correspond to the smoothing problem in which the initial state were known 
exactly; that is, P(t.) = 0. 

The regulator problem z(t) = o corresponds to a smoothing problem 
in which the observed value of signal plus noise is equal to zero on 
[t,,T]. While such a set of observations is extremely unlikely, it is, 
nevertheless, the most likely curve Zr 09 for a stable system with the 
mean of the a priori distribution for x(t.) equal to zero, 

‘While this formulation of the control problem is somewhat arti- 
facial mice ercen: for the regulator problem, the desired output is 
usually not known in advance, the point we wish to make is that the 
problem of finding the minimum energy control so that the output of the 
system follows a desired trajectory using a quadratic performance criteria 
is mathematically identical with the problem of finding the most likely 


random input after making noisy observations of the output. 
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CHAPTER IV 
OBSERVABILITY AND CONTROLLABILITY 
4041 Introduction 


The purpose of this chapter is to give a brief introduction to the 
concepts of observability and controllability and to relate these ideas 
to the present formulation of the estimation problem, These concepts 
were introduced by Kalman [32, 34] and have been studied by him [| 33] and 
his associates [70]. The paper by Bertram and Sarachik represents an 
early contribution, [69] . 

We shall confine our attention to discrete-time systems since 
certain problems arise in discrete-time systems which do not exist in 
their continuous-time counterparts and since the results are easily ex- 
tended to continuous-time systems which have been studied more extensively 
in the recent literature [33, 70]. 

The definitions of controllability and observability which we 
shall introduce will differ slightly from those given elsewhere, The 
modified definitions will enable us to include linear systems with singular 
transition matrices and =-- in principle -- nonlinear systems, They will 
also simplify the development of the relation between observability of 
a system and controllability of the adjoint system which is of particular 
Significance in the estimation problem. 

The material in section B6 of Appendix B will be continually re- 


ferred to and provides necessary background for this chapter, 


4.2 Preliminary Considerations 


Consider the following linear discrete-time system; 


aos 





x(k+1) = F(k)x(k) + G(k)u(k) (4.1) 
y(k) = H(k)x(k) (4.2) 


In section 1,3 we showed that the forward transition matrix for this 


system may be expressed in the following form; 

E(m+1 ,k+1) = F(m)F(m-1) ° ° © F(k+1) m>k (4.3) 
The adjoint system of (4.1) and (4.2) is defined to be 

x(ke1) = F'(k)x(k) + H'(k)u(k) (4,4) 

y(k) = G'(k)x(k) (4.5) 
The backward transition matrix for the adjoint system is 

Z(k,m) = F'(k+1) © © ° F* (mot )F' (m) m>k (4,6) 
Comparing (4,3) and (4.6) we obtain the following important identity; 

& (m+ »Kt1) = a (k,m) (4.7) 


The relation between a system and its adjoint will be discussed in some 


detail in the sequel. 


4.3 Definition of Observability 


The question of observability has to do with the basic structure 
of the system under consideration, Roughly, the question is this; "Given 
a finite sequence of exact measurements of the output of an unforced system, 
can one uniquely reconstruct the corresponding state sequence?" If the 
answer is yes, the system will be called completely observable on the 


Sequence of times during which the measurements are made, 
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In order to develop some insight we shall first examine the 
question of observability for a time-invariant discrete-time system 
before giving a more precise definition of observability and considering 


its ramifications, Consider the following system; 
x(k+1) = F x(k) + G u(k) (4.8) 
y(k) = H x(k) (4.9) 


with the state vector x, and n-vector, and the output y. If the system 
is unforced,u = 0. Suppose that we make n consecutive measurements of 
the output y. For convenience we assume that the first measurement 
occurs at time zero, Clearly, this involves no loss of generality in the 
time-invariant case, We may then write the following equations relating 


the output sequence {¥(0)»000sy(n=1 )$ and the initial state x(o); 


y(o) = H x(o) 
y(1) =H F x(o) 


° 
° 


° 


y(n-1) = HF! x(0) 


or, equivalently 


y(o) H x(o) 
y(1) HE 
y(net ) H pel (4.10) 


We wish to know whether or not the initial state x(o) can be determined 
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uniquely as a result of the measurements of the output. Once the initial 
state x(o) is known, the sequence of states { x(k)§ (for k = 0) of the un- 
forced system is given by (4.8). We implicitly assume that (4.10) has a 
solution; that is the output peeterice does not contradict the model of 
(4,8) and (4.9). From elementary matrix theory we know that x(o) can be 
determined uniquely if and only if the matrix on the right side of (4.10) 
or, equivalently, its reanseae [H', Re H'] has rank n. More- 
over, if the initial state of the time-invariant system cannot be deter— 
mined after n observations of the output, additional observations will 
convey no new information because, by the Cayley~Hamilton theorem Ar 
any positive integer power of ann by n matrix F may be expressed as a 
manear Combination of 1, F,..<«; piel and, therefore, additional observations 
are simply linear combinations of the first n observations. 

In devising a workable definition of observability for discrete 
time systems we are faced with a problem that does not arise in continuous- 
time systems, namely that a discretejtime system may be described by a 
forward difference equation, for example (4.8); or by a backward differ. 
ence equation, for example (2.34). 


Consider the forward system 

x(k+1) = £(x(k),k) (4.11) 
and the backward system 

Xx(ke1) = £(x(k),k) (4,12) 
the output of both systems being given by an equation of the following form; 


y(k) = h(x(k),k) (4.13) 
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In order to include both types of systems, we propose the following 
definition of observability. 
Definition: A discrete-time system will be called completely observable 
on a set of times va Jt1 000 M1 ym} if, by observing the output sequence 
{¥(5)oeoeo¥(m) § » one can uniquely determine the state sequence | xCS) een can) 
for all possible state sequences, 

It is clear that for the forward system (4.11) and (4.13) it is 
necessary and sufficient to determine x(j) and for the backward system 
(4.12) and (4.13) it is necessary and sufficient to determine x(m). 


4,4 Definition of Controllability 


The question of controllability, like the question of observability, 
has to do with the basic structure of the system under consideration. 
Roughly, the question is this; "Given any two states X, and X5» can one 
choose a finite input sequence in such a manner that the corresponding 
state sequence will be of the form { x; pecesXp{?" If the answer is yes, 
the system will be called completely controllable on the set of times 
during which the state sequence occurs, 

In order to develop some insight we shall again consider the 
question of controllability for a time-invariant discrete-time system 
before giving a more precise definition of controllability. Consider 


the following system: 
x(k+1) = F x(k) + @ u(k) (4.14) 


with the state vector x, an n-vector, and the control input, u. Suppose 
that we wish to move the system from a known initial state x(o) to a 


desired state X, at time n. We may then write the following equations 
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relating the state sequence and the control sequence; 


x4(n) = FY x(o) Hee G u(o) +*° © © +G u(n-1) (4,15) 
We may rewrite this last equation in the following forn; 


X,(n) aero) = (rg Gieecetecs G. |al| BCoy 


6 
© 


0 


| w(n-t ) | (4,16) 


We see that a solution of (4.16) will exist if and only if x x(n) = ig x(o) 
is a linear combination of the colwm vectors of the matrix Ta! Grooo, F GG]; 
in which case x,(n) - F’ x(o) belongs to the range of that matrix, If. the 
matrix ee = Gye dc, G 0G | has rank n, then any final state may be reached 
from any initial state inn transitions, Moreover, by the Cayley-Hamilton 
theorem, any state which can be reached from x(o) can be reached in (at 
most) n transitions. For the special case of a single input system, the 
matrix G becomes a vector g and we obtain the results that every final 
state may be reached from every initial state if and only if the eT 
(F' ¢,..6. Fg, g]is non-singular and that any state which can be 
reached, can be reached in (at most) n transitions, These results are well 
known (32, 69 | and are the basis for the design of "dead beat" systems, 

In devising a workable definition of controllability we are again 


faced with the problem of forward and backward difference equations. Consider 
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the forward systems: 

x(k+1) = £(x(k), u(k),k) (4217) 
and the backward system; 

x(k=-1) = £(x(k), u(k),k) (4.18) 


In order to include both types of systems, we propose the following defini- 
tion of controllability. 
Definition: A discrete.time system will be called completley controllable 
on a set of times {j, j+1,.00, mo" mi if for every x, and x, a control 
sequence f w(j).e00,u(m) § exists such that x(j) = X, and x(m) = Kno 

It is clear that for the forward system (4.17) only the sequence 
{ u(j),0o0,u(m=1)} is pertinent and for the backward system (4,18) only 


the sequence { u(j+1), 00. »u(m)$ is pertinent, 


4.5 Controllability and Observability for Linear Discrete-Time Systems 





The questions of observability and controllability for time-varying 
linear discretetime systems are closely related to those for the time- 
invariant systems already discussed, Let us consider the system of (4.1) 


and (4.2) which we rewrite for convenience 
x(k+1) = F(k)x(k) + G(k)u(k) (4.1) 
y(k) = H(k)x(k) (4.2) 


If we observe the output sequence } Gj owe il) for the unforced system, 
uzo, then we may write the following equation relating the output sequence 


{¥(3),ooc,y(m) § to the initial state x(j); 
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y(j) H(j) x(j) 
y(j+t ) H(j+i) (j+1,3) 


° ° 


o ° 


y(n) H(\tm) £(m, j) (4.19) 


Since this is just a special case of the equation Ax = y which is cone 
sidered in Appendix B, we may apply lemma B6.1 to obtain the following 


result. 


Lemma 4,51: The system (4.1) and (4.2) is completely observable on the 
set of times { jodtl ooo Mal pm | if and only if the following matrix is 


positive definite; 


mM 
Me(jom) = > | B'(k,j)H' (k)H(k)B(ky 3) 
k=Jj (4,20) 


If M.(j.m) is positive definite, then the unique solution for x(j) is 


m 
x(j) = M' (j »m) oo = (k,j)H! (k)y(k) (4 21) 
= 4 : 


From (4.21) we see that it is not necessary to know the sequence 


Pd) pre »y(m) { but merely the following sum; 


m 
S_) B'(k,g)H! (k)y(k) 
k=3 


Note that if M (3 om) is positive definite, then so is M.(j,mtk) fork > 6. 
Let us now consider the question of controllability of the same 


system, (4.1). If we consider the control sequence { u(j),.o0,u(m-1)§, 
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then we may write the following equation relating the states x(j) and 


x(m) and the control sequence 


x(m) - S(m,j)x(3) = [ B(m,5+1)G(5),-6.,G(m-1)}] | uj) 


° 


° 


u(m~1 ) (4.22) 


Again this is just a special case of the equation Ax = y and we immediately 
obtain the following lemmas as special cases of lemmas B6.2 and B6.3 re- 


spectively. 


Lemma 4,52: The state x(m) may be reached at time m from the state x(j) at 
time j if and only if x(m) = &(m,j)x(j) belongs to the range of the follow. 
ing matrix: 

M= 1 


W.(j,m) = &(m,k+1 )G(k)G! (k)E* (m,k+1 ) 
: =z (4,23) 


In this case, the solution which minimizes 


m=1 Z 
$ |ju(k) || 
k= 
is 
w(k) = G*(k)Em, K+ Wh (j,m) [x(m) = E(m,5)x(3)] Gy vvegmet (44.24) 


Lemma 4,53; The system (4.1) is completely controllable on the set of times 
ae J*1 yoo oy M1 »m{ if and only if We(j,m) is positive definite. In this case 


the solution which minimizes 
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IMn= 1 2 
$ || u(k) |] 
=J 
is 
u(k) = G'(k)£' (m,k+1 We”! (50m) | x(m) = E(m,j)x(5)] k=, 000,m~1 (4.25) 


°o ° ° 


Let us now consider a system described by a backward difference 
equation. In particular, let us consider the adjoint system given by (4.4) 


and (4.5) which we rewrite for convenience 
x(ke1) = F'(k)x(k) + H'(k)u(k) (4.4) 
y(k) = G'(k)x(k) (4.5) 


In order to examine the question of observability we write the following 
equation relating the output sequence ; y(j),cocsy(m){ and the state x(m) 


for the unforced systen: 


y(3) G'(j) Bj om) x(m) 
y(me1 G'(m=1) ¥(me1 ,m 
y(m) G! (m) (4,26) 


Again we note that (4.26) is just a special case of the equation Ax = y 


and we immediately obtain the following lemma as a special case of B6.1. 


Lemma: 4.54: The system (4.4) and (4.5) is completely observable on the 
set of times eyereee m{if and only if the following matrix is positive 


definite; 
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m 
M(j.m) = Z' (k,m)G(k)G" (k)E(kym) 
My(Jem) = 2) £1 (eam) Q(%) m as 


If M (j.m) is positive definite the unique solution for x(m) is 


m 
x(m) = M7! (jm) Do, E! (kam) CK) ZC) _ 
=] ° 


° ° ° 


Using the identity (4.7) we may write (4.27) as follows; 


mh 
M (j,m) = B(m+1 ,k+1 )G(k)G' (k)B! (m+1,k+1) = W.(j,m+ ) 
=o 2 "= (4,29) 


A comparison of (4.23) and (4.29) yields the following theorem, 


Theorem 4.55: The system (4.1) and (4.2) is completely controllable on 
the set of times { j,...,m#i$ if and only if the adjoint system (4.4) and 
(4.5) is completely observable on the set of times { j,...,m}. 


° © ° 


Let us now examine the question of controllability of the adjoint 
system, We consider the control sequence | u(j+1),.0.,u(m)§ and obtain 
the following equation relating the states x(j) and x(m) and the control 


sequence : 


x(j) = E(j.m)x(m) = [ H (G41), 000 E(5,m=2)H! (mot) 2 (j,m-1 )H'(m) } ju(j+) 


6 


u(m-t1 ) 
u(m) 


(4,30) 
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Again this is just a special case of the equation Ax = y and we obtain 


the following lemma as a special case of B6.3. 


Lemma 4,56: The system (4.5) is completely controllable on the set of 
times {j,j+1,.00,m=1,m}if and only if the following matrix is positive 


definite; 


m 
W,(j,m) = E(j,k=1)H! (k)H(k)E! (j,k=1) 
Dee oes : ’ (4,31) 


ni Wh (5 om) is positive definite the solution which minimizes 


m 2 
ee Com 
k=j+1 
is 
oo | 
u(k) = H(k)E'(j,k=1 Ww, (jom) [x(5) = £CG.m)x(m)] k= j+1, 005m 
o o ° (4,32) 
Using the identity (4.7) we may rewrite (4,31) in the following 
form; 


nm 
We (dom) = DEN (ky j#1 HY (ik) H(K)E (ke JH) = Mp (S41 sm) 
k=j+1 (4.33) 


A comparison of (4,20) and (4,33) yields the following theoren. 


Theorem 4,57: The system (4.1) and (4.2) is completely observable on 
the set of times {j,j+1,.0.,m} if and only if the adjoint system (4,5) 
and (4.6) is completely controllable on the set of times { jo-1,Jj,000.mf. 


° ° ° 


It is interesting to note that the lemmas given elsewhere [70] 
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correspond to the results we have obtained for the backward system, For 
discrete-time systems in which F7! (k) exists and for continuous-time sys- 
tems the distinction between forward and backward equations does not arise, 
In these cases the basic lemmas may be stated in alternate forms corres- 
ponding to those we have obtained for the forward and backward systems. 

We may easily relate theorem 4.57 to the general solution of the 
linear smoothing problem given in Appendix D. Consider the =e in which 
the random input w and the measurement noise v are uncorrelated, For 


convenience we rewrite (D.47) in the following form; 


Mket) = Ft(k) Ak) + Hk) [HCRPCHN Ic) + RC] * 
fa(k) = H(k) [ X(klke1) + P(k)FY(k) \(k)] § (4,34) 
In addition from (D.48) we obtain 
X(oln) = m + P(o))(-1) (4,35) 


From (4.35) we see that the estimate of the initial state may take on 
all possible values only if \(~1) can take on all possible values, but 
from (4.34) we see that \(-1) can take on all possible values. only if 
the adjoint system is completely controllable on the set of times 
lees sn The solution of the linear estimation problem given in 
Appendix D is optimal even if the system is not completely observable 
and the optimal procedure is simply to estimate the a priori mean for 
those states which are unobservable, 


If we note that the expression 
a(k) = H(k) [x(klke1) + P(k)E*(k)\(k) J 


appearing in (4.34) is the estimate of the measurement noise v(k) we 
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see that an absurd situation would arise if the system were not come 
pletely observable and the adjoint system were completely controllable. 
For then we would be using the estimate of the measurement noise to modify 
the a priori estimate of certain state variables when the measurements 
themselves contained no information about those state variables, 

We may also relate theorem 4.55 to a simple control problem, Con- 
sider the problem of moving the system (401) from a specified initial 
state x(j) to a desired state X, inn transitions using the control 
sequence for which the "energy" expended is minimum, "nergy" is defined 


to be 


Ma 1 Z 
Se $ |lu(k) | »m = j+n 
k=j 


To solve this problem we minimize the following expression; 


Ma 1 2 
2, #i/zCs) | + 4's) [xGcH) - B(c)x(k) - G(k)u(k)] 
k=o (4,36) 


subject to the constraint 


M= | 
x4(m) - E(m,j)x(3) = >) B(m,k+ )G(k)u(k) 
k=} (4.37) 


Setting the partial derivatives of (4.36) with respect to u(k), A(k) and 


x(k) equal to zero yields the following equations: 


x(k+1) = F(k)x(k) + G(k)) (k) (4.1) 
A(k=1) = F'(k)\(k) (4,38) 
u(k) = G'(k))(k) (4,39) 
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Note that u(k) is the output of the adjoint system. From (4,38) we may 


obtain the following expression; 
Mk) = £(kym-1 )A(m=1 ) (4,40) 


Substituting (4,38) and (4.40) into (4.37) yields 
Meo 1 
x, (m) - E(m,j)x(j) = 2 B(m,k+ )G(k)G' (k)E(k,m=1 )\(me1 ) (4.41 ) 
=a 
Using the identity (4.7) we may rewrite (4.41) in the following 


two equivalent forms; 
X4(m) = E(m,j)x(3) = We(j,m)A (met ) m= jin (4,42) 
x, (a) - &(m,j)x(j) = M (Jj smet )d(m=1 ) »m = jt (4,43) 


Lemma 4,52 says that the state xX, may be reached from x(j) in n transitions 
if and only if some A(m-1) satisfies (4.42). Lemma 4.53 says that every 
state may be reached from x(j) in n transitions if and only if there is 
a unique \(m-1) satisfying 4.42. If we consider u(k) to be the output of 
the adjoint system then, by lemma 4,54, \(m-1) can be uniquely determined 
on the basis of the following sum: 

m= 1 

X,(m) = B(m,5)x(5) = >, £' (m=1,k)G(k)u(k) 

k=) 
if and only if the adjoint system is completely observable on the set of 
times Nines pele | - Theorem 4.56 simply says that the system is completely 
controllable if and only if the Lagrange multiplier \(m-1) may be determined 
uniquely; that is, if and only if the adjoint system is completely observable 


on the set of times{ j,...,m=1}. 
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4,6 Nonlinear Systems 


At present almost nothing is known about controllability and 
observability for nonlinear systems, A theorem on local controllability 
of nonlinear systems was proven by Markus and Lee [71] by assuming that 
the linear variational equations were completely controllable, The dis- 
cussion of their paper by Kalman is also of interest. 

In spite of the present lack of understanding of these matters, 
there still seems to be a strong connection between observability of a 
nonlinear system and controllability of the adjoint system. This is in- 
dicated by the fact that the ubiquitous two-point boundary value problems 
involve the adjoint equations. For example, consider a system in which 
the state vector may be divided into a completely unobservable part x, 
and a remaining part Xy » in such a way that the equations for the system 


may be written in the following form: 


x, (kt) = £, (x, (k) (ke) ok) 


x (k+1) = £, (x, (k) X5(k) a (k)k) 
y(k) = h(x, (k),k) 
If we examine the adjoint system 
\Mket) = (Aaf'/ax) Atk) + (@b'/ax) v(k) 


or, more specifically 


Ay (ket ) 2£,/2x, 2 £,/2x, A, (ic) ah'/ax, | v(k) 
= + 

\ Cket 0 ' 

Ap (ket ) 0 2 £/2x, A, () 0 
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we find that there is no trajectory along which the linearized adjoint 
system is completely controllable. 
Similarly, if we consider a system with obviously uncontrollable 


state variables of the following form; 


X, (et) = £, (x, (ic) x, Us) a (ic) ,k) 


x (lH) = £5 (x, (ie) ok) 


we find that there is no trajectory along which the following adjoint 


system is completely observable; 


A, (k=1) af /ox, 0 A, (i) 
dp (ket) 2 £,/ex, 2£,/2x, A, () 
y=[2f/on 20] d, (x) 

X«) 


If we were to consider the problem of observing a small perturbation 
of the initial state x(j) of the system (4.11) and (4.13) we would investi- 


gate the variational equations 


§x(k+1) = (9 f£/2x) £x(k) 


Sy(k) = (ah/ox) £x(k) 


Complete observability of the variational equations along the nominal tra~ 
jectory would insure local observability of a small perturbation in x(j). 
Complete observability of the variational equations is, of course, the same 


as complete controllability of the adjoint equations. 
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CHAPTER V 


EXPERIMENTAL RESULTS 


Te Introduction 


In this chapter we shall present the results of the application 
of several of the techniques of Chapter II to some particular examples. 
The aim of these numerical studies is to give an indication of the type 
of performance that can be expected when approximation methods are used 
in nonlinear estimation problems, With regard to the approximation 
technique for the filtering problem given in section 2,81; the behavior 
of the P* matrix, the stability of the filter, the effect of known con- 
trol inputs, and the overall quality of the estimates, are all of prime 
importance, With regard to the smoothing problem: the quality of the 
first approximation sequence | x"(k|n) 3 and the convergence properties 
of the successive linearization technique, are the basic factors to be 
considered, While one cannot draw positive general conclusions from 
specific examples, we feel that the results of these simulation studies 


indicate that the methods under consideration hold great promise. 


Dee Pictures and Random Variables 


The simulation studies discussed in this chapter were performed 
on an IBM 7090 computer, The results will be Sees in a series of 
pictures: each of which represents a time history of 1000 points, time 
running from left to right. The scale for each picture will be indicated 
by giving the maximum in absolute value of the variable or by referring 
to an adjacent picture of the same scale, Positive values of the variables 


lie below the horizontal axis, 


1 1A 


Bar === 





Gaussian random variables were approximated by a sum of twelve 
independent pseudo-random variables, each having a uniform distribution 


on the interval [0,1]. 


5e3___ Example; As a first example a first order system with a randomly 
varying parameter was investigated, Such a system could be a model for 
a noisy RC network in which the resistance varied in a random manner. The 


continuous-time version of the system is given by the following equations; 


x(t) =[a + x5(t)] x1 (t) + u(t) + w(t) (5.1) 
x(t) = bx,(t) + w(t) (5.2) 
a(t) = x,(t) + v(t) (503) 


where u(t) is a known input; w, (t), Wy (t) and v(t) are white Gaussian 

noise processes; z(t) is the observed output; and the random parameter 
Xp (t) is the output of a linear system excited by white Gaussain noise, 
Because the simulation was to be conducted on a digital computer, this 


system was approximated by the following discrete-time system; 


‘ 


x4 (t+h) = x,(t) + [a + x5(t)] x, (t)h + u(t)h + w, (t)h (5.4) 
Xo(tth) = x(t) + b x (t)h + wo(t)h (525) 
z(t) = x,(t) + v(t) (5.6) 


For a = ~ 1,0, the value of .0i was used for h, This corresponds 
to a sampling period of 10 milliseconds for a system with an average time 
constant of one second, 

The recursive scheme of (2.73), (2.74) and (2.77) was used to 
obtain estimates of X, and xX, in real time, The successive linearization 
technique of section 2,84 was used to obtain a solution of the smoothing 


problem, The equations describing the estimation procedure for this 


Sis 





problem can easily be obtained by straightforward substitution into the 


basic equations of Chapter II. 


tl Case I 


The following data pertains to the simulation results shown in 


Plate I, 
system parameters: a=e1,0 
b = = 0,1 
h= 0,01 
initial conditions: X,(0) = 1.0 
Xo (0) —- O55 
a priori distribution: m, = 0.0 Py4(0) = 0.5 
m= 020 Din(o) = 0.0 
Z 174 
P59 (0) = 0.6 
noise levels: E Lw,(j)w,(k)J= 36.0 $5 


E L Wo (jw (k)] = 100 S55, 


E [v(5)v(k)] =-= 9.0 } 4k 


Wy» Wo and v are statistically independent, 
Discussion: This data set represents a rather severe noise condition as 
indicated by parts B and C of Plate I, The real time estimate of X, 
closely followed the actual value, The error in the a priori estimate 
of X» was overcome soon after the known input u excited a transient, as 
can be seen in parts G and H. 

The results of the application of the successive linearization 
technique to the smoothing problem are shown in parts F and I. The iter- 
ation procenuee was terminated when the Re BC eNeattons between the 
ith oni the i+! approximations for both x, ae X5 were less ee one per- 
cent of the maximum value of the quantity being estimated, This criteria 


was satisfied by the third approximation. The difference between the 
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first three approximations could not be detected in the pictures, 
Note that in the smoothed solution for x, shown in Part I, the error 
in the a priori estimate was overcome, 
The known input u was chosen as a representative example of the 
type of saturating signals occurring in control systems, The total com- 


puter time used for the entire simulation was 0.73 minutes, 


pe. Case ITI 


The following data pertains to the simulation results shown in 


Plate II. 
system parameters ; a= 1,0 
b= 0O,0 
h= 0,01 
initial conditions; x, (0) = 2,0 
X5 (0) =- 1.5 
a priori distribution: m, = 0.0 P,,(o) = 1.0 
mo = 0.0 Py (0) = 05.0 
P55 (0) = 0.6 
noise levels; E [ w, (3 )w, (k) Js 16.0 a 
E [ Wp (Jw, (k) J= 0.0 
E[v(j)v(k)] = 9.0 S,, 


Wis Wo and v are statistically independent, 


Discussion: For this data set the parameter X is an unknown constant, 
The severity of the noise conditions is indicated by parts B and F of 
Plate II. In part D it can S seen that soon after the known input u 
excited a transient, the estimator was able to estimate X5 very closely, 
The estimate of Xx, is shown in part H. The improvement in the estimate 


on X, with increasing time can be attributed to the information in the 


transient and the improved estimate of x,. 


o117= 





As a second example, a second order system with a randomly 


Sot Example; 


varying natural frequency was investigated, The continuous-time version 


of the system is given in the example of section 1.43. Because the sim- 
ulation was to be conducted on a digital computer, this system was approx- 


imated by the following discrete-time system: 


x, (tth) = x,(t) + X5(t)h (5.7) 
Xp(t+h) = x(t) - b x5(t)h - [ ¢, +x, (t)] x, (t)h + u(t)h + w, (t)h 
(5.8) 
X(tth) = x,(t) - a x,(t)h + wo(t)h (59) 
a(t) = x,(t) + v(t) (5.10) 


The recursive scheme of (2.73), (2.74) and (2.77) was used to 
obtain estimates of x,, x, and X in real time, The equations describing 
the estimation procedure for this problem are easily obtained by straight- 


forward substitution into the basic equations of Chapter II. 


\ 


= 


t 


“1 Case III 
The following data pertains to the simulation results shown in 
Plate III. 
system parameters: a = 0.1 
b = 0.2 
GC.) =) 360 
h = 0,01 
initial conditions: x, (0) = 0,5 
Xn (0) = 0.5 
(0) = eon 
3 
a priori distribution: 
= FOO Doe — sO Pon(0) = 0.5 
1 11 
=1050 P45 (0) =020 P53(0 = 0.0 
ae re 0.0 P1360) — 0.0 P33, (0 es 0.5 
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noise levels: E [wy (J)w, («)] = 0.16 S a4 
E [wo(j)wy(k)] = 225.0 6.) 
Elv(j)v(k)}] = 0.6 Sax 


Wy» We and v are statistically independent, 


Discussion: In this data set large and rapid changes in the natural fre- 
quency ee + x3! occur, The noise conditions are moderate as shown in 
parts B and C of Plate III. The real time estimates of the state variables 
are seen to follow satisfactorily the rapid changes in the state variables, 
Note that, since X» is the discrete-time analogy of the derivative of X4 
the job of estimating this quantity is intrinsically difficult. Again 
there is improved behavior after the known input u excites a transient, 

The behavior of the P matrix shown in parts J through 0, is typical of 

the behavior observed in a large number of runs, The total computer time 


used for the entire simulation was 0.74 minutes, 
42 Case IV 


The following data pertains to the simulation results shown in 


Plate IV, 
system parameters: a = 0.0 
b = 1,0 
c. = 0,0 
h° = 0,01 
initial conditions: x,(o) = 0.0 
x,(o) = 020 
x5(0) = 4.0 
a priori distribution: 
m, = 0.0 P, 4 (0) = sOn0 Poo lo) = 0,0 
Ms = Ved Py >(0) = 0,0 P23(0) a 0.0 
m3 = 120 P4360) = 0,0 P33(0) = 4,0 
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noise levels; E [w, (jw, (k) J = 1.0 bak 
E [ Wo (5 )wo (k) J = 0,0 
ELv(j)v(k)] = 0.01 55, 


Wi. Wo and v are statistically independent, 


Discussion: For this data set there is no known input. The natural free 
quency x, is an unknown constant, The noise condition is very severe as 
indicated by part A of Plate IV, The estimate of X takes about 750 
sampling periods to overcome the error in the a priori estimate and settle 
on a good estimate of X30 Once this has occurred, the estimates of the 


other state variables also improve. The total computer time used for 


the entire simulation was 0.64 minutes, 
ot Case V 


The following data pertains to the simulation results shown in 


Plate V. 
system parameters: a = 0.0 
b = 0.5 
Co = 0.0 
h = Q,01 
initial conditions: xX, (o) = 0,0 
X, (0) = 0.0 
X,(0) = 3.0 
2 
a priori distribution: 
m, = 0.0 Py 4 (0) = 1.0 Po9(0) = 0,5 
m, = 0.0 P42 (0) = 0.0 P5530) = 0.0 
mz = 0.0 P43(0) = 0.0 P33 (0 = 4,0 
noise levels: E (wi (j)wy(k)]= 0.25 Say 


E (wo(5)wo(k)]= 0.0 
E [vGj)}vde)| = 1 70 Say 


Wy. Wo and v are statistically independent. 
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Discussion: For this data set the natural frequency of the system is 

again an unknown constant; but this time there isaknown input, The noise 
levels are moderate as indicated by parts B and © of Plate V. Again notice 
that the estimate of X, overcomes the error in the a priori estimate soon 
after the known input excites a transient, The overall performance of 

the estimator is quite satisfactory. The behavior of the P* matrix is 


typical, Note that the "variance" decreases rapidly as the estimate 


# 
=o 
x6, (KI k) improves, The total computer time used for the entire simulation 


was 0.68 minutes, 


544 Case VI 


The following data pertains to the simulation results shown in 


Plate VI. 
system parameters: b= 1055 
co = Ze0 
h = 0,01 
(system) a = 0,1 
(estimator) a = 0.2 
initial conditions: x, (0) = 095 
X5 (0) = On 
x3(0) = 7.0 
@ priori distribution: 
m, = 0.0 Py 7 (0) =a 5 Poo (0) = 1.0 
m, = 0.0 Pyo(0) = 0.0 P93 (0) = 0,0 
pore ee Dee ees Br ee 
noise levels: E [wy (Jw, (k)] = 0,36 sx 
(system) E [wo (5 )wo (k)] = 0.0 
(estimator) E [wo(j)wo(k)] = 225.0 § " 
E [v(j)v(k)] = 71.0 5; 


Wy» Wo and v are statistically independent, 
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Discussion: For this data set the estimator had imperfect information 
about the statistical behavior of X30 The actual value was the geometric 


curve obtained as the output of the following system; 
X(t+h) = X(t) al x4 (t)h X3 (0) = 7,0 


The estimator was designed under the assumption that x(t) was the out- 
put of a linear system of a different time constant excited by white noise, 
The noise conditions were moderate to severe as is indicated by 
parts B and C of Plate VI. Despite the large error in the a priori es- 
timate and the misinformation about the statistical nature of X39 the es- 
timator quickly overcomes the initial error and gives satisfactory per- 
formance, even before the known input u excites a transient in the system, 


The total computer time used for this simulation was 0.76 minutes, 


aD Conclusions 


i 


In all cases studied the elements of the e matrix were well 
behaved, In fact, they were so well behaved that it seems likely that 
in an application of these techniques, a simplified version of the filter 
could be obtained by presetting the values of the diagonal elements of 
the P matrix to agree with the average behavior of these elements in 
previously conducted simulation studies, This would be very desirable 
if an analogue computer were to be used, bine it would reduce the 
number of multipliers needed, | 

No instahility of the estimator was observed. 

The effect of known inputs is to excite transients which contain 
information about the system parameters and simplify the estimation 


problem. The exact nature of the known input does not seem to matter as 
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long as transients are excited, 

The performance of the estimator in Case VI indicates than an 
estimator designed on the assumption that unknown parameters are the 
outputs of linear systems excited by white noise, can give good estimates 
even when this assumption is apparently unjustified, This is very im- 
portant since information about the statistical behavior of a parameter 
may be limited, This is also an indication of the power of the model 
which treats random parameters as the outputs of linear systems excited 
by white noise. In practice the time constants of these fictitious linear 
systems would be chosen to reflect the anticipated rapidity of the en- 
vironmental changes. 

The quality of the estimates depends on the difficulty of the 
problem; that is, on the presence of known inputs and on the noise levels. 
The examples presented, for the most part, represent cases of rather 
severe noise pone ions in which satisfactory estimates were repeatedly 
given, A noisy noise annoys an oyster but apparently, the nonlinear 
estimator is a pearl of a different sort. 

The method of successive linearization converged rapidly to the 
solution of the nonlinear two-point boundary value problem in the cases 
investigated, The first approximation {x"(k|n)} was excellent in all cases 
and could be used with confidence as the starting point in any method 


of successive approximations. 


Hiege 
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A: Known input u(k) 
Max: 10.0 





C: Observed output z2(k) 
Max: o25 





E: Real Time Estimate x, (k {k) 
Secale of D 














B: 


BD: 


F; 


Total input u(k) + w(k) 
Seale of A 


State variable x, (k) 
Max: 9.5 


Successive Linearization 
i1=1, 2, 3 £4xSeale of D 


x; (k|n) 





-\2A5- 





G: State variable X- (k) 
Max: 0.51 





H: Real time estimate x5 (kIk) 
scale of G 





Ts euge acer linearization om (kjn) 
Pes ol ee. 4 we eabe- o1 -G 


PLATZ I (con't,) 





— lAo- 


A a ry 
Bear) U 
Dea rit tn 


Fat 
Ca 


‘ és bs ot 2° lad ed 
(att ah e Je es OG 





A; Known input u(k) B; Total input u(k) + w(k) 
Max: 10.0 Seale of A 





C: State variable X5 (k) D; Real time estimate x5 (k 1k) 
Xp (k) ea eS Scale of C 





EB: Successive Linearization x, (kIn) 
i= te 253 seale of -C + 


PLATE II 
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x) 


Observed output z(k) 
Max: 13.4 





G: State variable x, (k) H: Real time estimate xj (kik) 
Max: 4,4 Scale of G 





I; Successive linearization J: Successive linearization 
xy, (kin) xy, (kin) 
i=1 Seale of G i Oe Seale of G 


PLATE II (con't.) 
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A: Known input u(k) B: Total input u(k) + w(k) 
Max: 5.0 Seale of A 





C: Observed output 2(k) = x, (k) + v(k) 
Max: 4.97 





D: State variable x, (k) E: Real time estimate x; (k|k) 
Seale of C Scale of C 


PLAT 


t 
a 
Hi 
HH 








a 





F: State variable X5 (kK) G: Real time estimate x5 (k| k) 
Max: 3.5 Seale of F 





H: e(k) = co t+ Xx (Kk) I: Real time estimate 
e"(kik) = cy + x3 (k1k) 


Oo 


J: Pii(k) Scale of 0 K: Pyo(k) Seale of 0 


Max: 8.85 ,c. = 3.0 Scale of H 








N: P53(k) Scale of 0 O: P5a(k) Max: 3.24 


PLATA iii (eon! t..) 
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A: Observed output 2(k) = x, (k) + v(k) 
Max: 0.4 





3: State variable x, (k) C: Real time estimate x, ” (ki k) 
Seale of A Seale of A 





D: State variable x,(k) E; Real time estimate x, ” (klk) 
Max: 0.2 Seale of D 





F; State variable X3(k) G; Real time estimate X, ~(K\K) 
Value + 4,0 Seale of F 


PLATE IV 
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A; Known input u(k) B: Total input u(k) + w(k) 
Max: 5,0 Scale of A 





C: Observed output z2(k) = x, (k) + v(k) 
Max: 5.28 





D: State variable x, (k) E: Real time estimate x, (kk) 
scale of C Seale of C 


re 
> 
s 
i<t 
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F: State variable Xo (k) G: Real time estimate x5 (k| k) 
Max: 2,44 Seale of F 








H: State variable x3 (Ic) I: Real time estimate x3 (kk) 
Value 3.0 ocale of H 


J: Py,(k) Scale of 0 K: Py o(k) Seale of 0 





4 Z * 
ii P43 (x) Seale of 0 M: Poo (k) Seale of 0O 





~ ° Z A ; 
N: Pog (k) Scane of 0 O: P39 (k) Max: 4,0 


PLATE V (con't.) 
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A: Known input u(k) B: Total input u(k) + w(k) 





C: Observed output z(k) = X, (k) + v(k) 





D: State variable x, (kK) H; Real time estimate x, 1 (k Ik) 
Seale of C Seale of C 
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F; State variable x»(k) G: Real time estimate x5(kIk) 
Max: 1.4 Seale of F 





H: State variable X (k) I: Real time estimate x3 (k|k) 
Max: 7.0 Scale of H 





J: e(k) = c, + X3 (ic) K: Real time estimate 
e*(klk) = ors 269 (Ik) 
Max; 9.0 , c. = 2.0 Seale of J 


PLATE VI (con't. ) 





CHAPTER VI 
CONCLUSION 


Our research has been devoted to a fundamental problem of con-~ 
siderable importance in modern control theory, The linear problem has 
been solved (Appendix D) and a scratch has been made in the armor of 
nonlinearity. 

The experimental results of Chapter V are extremely encouraging 
and indicate that the approximation techniques may have wide applicability, 
The results show that better performance can be expected when some inputs 
to the system are known, Such known inputs usually excite transients, 
the effect of which can be measured at the output, These transients con- 
tain information about the state of the system and the values of various 
parameters, One can further speculate that if information about a parti- 
cular parameter is not available in the observations, then the performance 
of the system is relatively insensitive to variations of that parameter 
and the need for accurate estimation is diminished, 

The approximate solution to the nonlinear filtering problem is 
immediately applicable to adaptive control problems, Note that if measure. 
ments can be made of the input and the output of a subsystem containing a 
particularly troublesome parameter, then the order of the problem can be 
reduced significantly by treating only that subsystem. 

The assumption that various noises are Gaussian is equivalent to 
making a leastesquares estimate when only means and covariances are known, 

Like most research programs, this report raises more questions than 
it answers, It is hoped that the following list will stimulate thought, 


discussion and future research, 


Pics 





A theoretical question of great practical importance is that of 
stability, The stability of the linear filter has been shown by Kalman 
( 29] using the direct method of Liapunov [40], It seems likely that the 
stability of the nonlinear filter obtained by linearization about the 
present estimates can be proved under certain restrictions, What are 
these restrictions? 

It would be interesting to know the most general types of systems 
for which the probability density function (2.7) is unimodal, Convexity 
of (2.8) is a sufficient condition. 

A fast nie numerical procedure for solving the smoothing problem 
guaranteeing convergence would be of great value. 

The areas of observability and controllability are suitable for 
theoretical research, 

If a stationary random process is viewed as the output of a linear 
system excited by white noise, then the approximation techniques might 
be used to identify the parameters of this fictitious system, It wouid 
be interesting to compare the results of such an approach with the more 
conventional approach of measuring the correlation function or the spectrum 
and then approximating the spectrum by a rational polynomial. 

It seems likely that a combination of the method of successive 
linearization and the filtering technique could be used to obtain better 
estimates in real time, That is, instead of linearizing about the present 
estimate, one could smooth over the past N observations, This seems to 
be a real possibility in applications where sampling periods and computa- 
tion times are of the same order of magnitude, 

The development of approximation techniques for use in conjunction 


with the dynamic programming formulation of the discrete-time problem 
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would be an important contribution, Such techniques would be immed- 
iately applicable to a wide range of dynamic programming problems, 

It would be interesting to apply the successive linearization 
technique to the nonlinear regulator problem discussed in section 3.8, 

It would be desirable to simplify the nonlinear filter as much 
as possible without impairing performance, Presetting the values of the 
diagonal elements of the P matrix is one possibility, Another possibility 
is to use the basic configuration of the filter but to set certain para. 
meters by the method of stochastic approximation [38, 56]. 

Finally, there is need for further simulation studies applying 


the methods of this report to more complicated systems, 
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APPENDIX A 


MATRIX IDENTITIES 


Al Quadratic Forms 


We begin by summarizing a few facts about matrices so that 
they will be available for reference, 


The length (or Euclidian norm) of a vector x is simply 


Jal = Vay" + ove x2 


Clearly, || x || > o for x # 09. 
Let Q(x) be the quadratic form associated with a symmetric 


matrix A. That is, 


9 ae e => a.. 


Q(x) = x'éx = 2 a at eae) 


x.X. 
Ajig 
A is said to be positive definite if Q(x) > 0, for x #o. Clearly, 
Q(o) =o. A is nonenegative definite if Q(x) = 0, for x #0. Let 


A and B be non-negative definite and let C = A+B. Then 
x'Gx = x'(A + B)x = x'ax + x" Bx 


Thus, € is at least non-negative definite, C would always be positive 
definite if either A or B were positive definite and the other were non- 
negative definite, 

Let D be any matrix. Then D'D is non-negative definite, since 
x'D'Dx = || Dx If =o, If D is non-singular, then D'D is positive definite 


Since in this case Dx = o implies x = o. 
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Conversely, any positive definite matrix A may be factored into 
a product of the form D'D where D is non-singular. To see this we re~ 
call that the characteristic numbers of a positive definite matrix are 
positive and that any symmetric matrix may be reduced by an orhtogonal 
transformation to a diagonal matrix having its characteristic numbers 
on the diagonal, Let A be positive definite and let T be the appropri- 


ate orthogonal transformation (T! = qr! ). Then 
T'AT = A = diag( \,oe00e dy) » AG > Oo 


A suitable D' is T! diag( V4 plea as If A were non-negative 
definite D would be singular, since some of the terms on the diagonal 


1 


of AV would be zero, If A were positive definite, then A could be 


written in the following form: 


an = EP diag(1/ \4,00¢,1/) ,)2! 


which shows that real 


is positive definite, 
Finally, if A is non-negative, then H'AH is at least non- 


. aA 
negative definite, since x'H'AHx =]DHx|= o. 


AZ __A Visual Approach 


In the opinion of the author, the use of block diagrams or flow 
graphs enables one to produce matrix identities far faster and more simply 
than by straight algebraic techniques, We will view a matrix as a linear 
operator, For example, the equation y = Ax we interpret as a system 
A operating on an input x to produce a unique output y. Similarly, the 
equation y = ABx is given the interpretation that first B operates on 


x and then A operates on this result to produce y. See Fig. A-1. 


ies 





Fig. Au 


Some basic axioms of matrix theory and the associated block diagram 


manipulations are given in Fig. A-»2, 





Fig. A=2 
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The question of placing a matrix inside a feedback loop is 
slightly more complex but is of extreme importance. Consider the 


"system" of A=3, 





Fig. Ax3 


This "system" implies the equation y = x-Ay or|/I+A]y =x. Note 
that the output y is specified uniquely by x only if [i + om exists. 
Thus, the diagram of Fig. A=-3 is valid only if ia + ay exists, since 
only then is the output uniquely specified by the input and in our 
basic concept of a system we require this, 

In order for a block diagram to represent a valid system the 
output of each individual element of the block diagram must be uniquely 
specified for each possible input. The block diagram manipulationsof 
Fig. A=2 may then be performed without affecting this uniqueness to 
obtain equivalent systems, 

Some examples will now be given to show how the ability to 
picture a matrix operation in the form of a block diagram is of great 


assistance, Important relations are obtained in Example 3. 


Example 1. Show that 


=| 


[r+asy =(a7' + py ta" if a’ exists 
and 
1. w-1r 2-1 =| ; ~ | : 
Pe AD eee a ty if BU exists 
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We use the visual aid of the block diagram of Fig. A-4, although a 


straightforward algebraic proof is simple for this simple problem, 





Fig, A—4 


Note that in moving blocks around the loop it is clear when the 


additional assumption must be made as to the existence of some inverse, 


Example 2, Consider the question, "Does the existence of [I + AB] 
imply the existence of [I + BA]™! 7" If so, determine the latter in 
terms of the former, We note that from Fig, A~5 [I + AB]! A =A[I+ Ba]! 9 


so the answer to the question is yes, 





Fig. A-5 


To obtain an expression for the inverse we introduce the quantity 
W= xX-By. Then, by the definition of y in the diagram on the left side 
of Fig. A-5, 


w=(E-B(Z+ apa] x 
but, by the diagram on the right side of Fig. A-5, 


w = [I + Ba)” x 
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for all x, Thus 


ie 1s Bit + wes a (A.1) 


which may be verified by direct multiplication. 


Example 3. ‘As a third example we shall develop two matrix identities 
of practical importance. Suppose Re exists and consider the expression 
PH' [| HPH! +R. In Fig. A-6 we simply back things around the loop to 


obtain the first desired result by inspection, 


pH! [ upu' +R)! = [1+ Pury]! Pur (A°2) 





Fig. Auv6 
Using (A.2) we note that 
L- PH! [ HPH' +R]! H = £ - [1+ PH'R'H)7! puR'H 
Rewriting I as a product, this relation is equal to 


= | 
H] - [L+ Pura) par” 


H 


Canceling terms, we obtain the identity 


Ces) 


L- PHY PH! +R] Wo = [1 + PHYR H) 


This result may be verified by direct multiplication, Note that (A.3) 


could also have been obtained by substituting PH' for B and Rv'H for A 
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in (A.1) and then bringing ae inside the inverse. 

The easy way in which the results (A.2) and (A.3) were obtained 
using the graphic aid of the block diagram is an indication of the 
power of this approach both for obtaining identities and proving the 


existence of inverses, 


Hoe 





APPENDIX B 
THE GENERALIZED INVERSE OF A MATRIX 
Bi Introduction 


The concept of the inverse of a matrix was first generalized by 
Moore [47 , 48] to include rectangular and singular square matrices. 
Later, Penrose [50, 51] independently introduced the same basic concept 


Farther discussion of the generalized inverse was given by Greville 


fei, 22) 

Penrose proves that for any matrix A there is a unique matrix 
at satisfying the following four relations: 

aah = 

(ant)! = aa® 

atast = at 


(atay! = ata 


and proceeds to discuss the properties of the generalized inverse, 


Kalman [29] defines a pseudo~inverse of a matrix A to be any 
T 


matrix satisfying (B.1) and shows how such a pseudo-jinverse may be 
used in linear filtering and prediction problems, 

This appendix is intended to serve as an introduction to the 
notions of generalized inverses and pseudo-inverses, We discuss their 
use in connection with the solution of linear algebraic equations. A 


geometrical interpretation of the generalized inverse in terms of orth- 


ogonal projections is presented, For simplicity, the discussion is 
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(B.1) 


(3.2) 


(B.3) 


(B.4) 





limited to real matrices, 


B2 The Equation Ax = y 





Consider the set of linear algebraic equations; 


244% Fee e Fay, = 


6 


ani F °° °F Fay = Sy (B.5) 


or, equivalently, the matrix equation Ax = y, where A is a known m by n 
matrix, y is a known m-vector, and x is an n-vector to be determined, 
If there is no vector x which satisfies this relation, the set 
(B.5) is said to be inconsistent. Otherwise, the set is consistent and 
there may be a unique solution or an infinite number of solutions, 
Suppose a solution exists and we decide to choose the smallest 
solution; that is, the solution for which | x |lor, eometencms $= |x _ 
is minimum, If we introduce a vector of Lagrange multipliers and treat 


(B.5) as a constraint, we may minimize 
2 x'x + \' (y = Ax) 


Setting the partial derivatives of this expression with respect to x 


and A equal to zero yields the following two equations; 
x= AtA (B.6) 


(B.7) 


Ie 


iS 


Combining these equations, we obtain 


y= AA' d (B.8) 
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If AA' is non-singular we may solve this equation for A and combine 


with (B,6) to obtain the desired solution: 
x = A'(AA')” y (B.9) 


One can easily verify that if A'(AA')”! exists, then it satisfies (B.1) 
through (B.4) and hence, is the generalized inverse, 

Suppose, on the other hand, that the set (B.5) is inconsistent, 
Then one might wish to make a least squares approximation by minimizing 
|| Ax ~ yil*. If we set the derivative of this expression equal to zero, 


we obtain 
A'ax = Aly (B.10) 


If AA is non-singular we may solve this expression for x to obtain the 


desired result, 
x = (AtA)* Ary (B11) 


igi (ata)?! as esists, it satisfies (B.1) through (B.4) and hence, is the 


generalized inverse, 


B3 The Pseudo=Inverse 


A pseudosinverse of a matrix A is defined [29] as any matrix AT 


satisfying the relation 


AATA = A (B12) 


It follows from (B.12) that (at)? = (at): and that if av exists, it 
is equal to At, 


The following procedure, similar to that given in [29] , may be 
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used to obtain a pseudo-inverse for any matrix, We first consider the 
case in which A is square, It is well known [17] that any square matrix 


may be reduced to a diagonal canonical form by a similarity transformation, 


PAQ = E 


where P and Q are non-singular and E is a diagonal matrix having only 


zeros andones on the diagonal. Then, 


PAQPAQ = E“ = E 


and 


Agpa = p"lzgy! = A 


Hence, QP satisfies (B.12) and is a pseudowinverse of A. Since 
or and Pp exist, a non-Singular pseudo-inverse may always be found 
for a square matrix. 

To obtain a pseudo-inverse of a non-square matrix we make use 
of the fact that either of the following relations imply that B is a 


pseudo~inverse of A; 


(ABA - A) (ABA - A)' = 0 (B.13) 
(ABA - A)' (ABA - A) = Q (B.14) 


Direct substitution into (B.13) and (B.14) respectively will verify that 


the following expressions are pseudo-inverses of A; 
At (aa')T (B.15) 


(A'A) TA (B.16) 
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The pseudo-inverse in (B.15) and (B.16) may be computed by the first 
method since (A'A) and (AA') are square, 

Since (B.15) and (B.16) are, in general, not the same, we see that 
the pseudojinverse is not, in general, unique, In fact, if B is a pseudo- 


inverse of A, then so are 


B+ (I ~ BA)M and B+ M(I ~ AB) 


where M is an arbitrary matrix, 

The pseudo-inverse may be used in connection with the solution 
of equations such as (B.5) because it has the property that any pseudo- 
inverse of A may be used to test for the existence of a solution and 
any pseudo-inverse of A may be used to obtain a solution when one exists, 
This property is summarized in the following theorem which is similar 


to one proved by Penrose for the generalized inverse, 


Theorem; The equation y = Ax, where y and A are known, has a solution 


+ 


if and only if, for any satisfying (B.12), the following relation 


is satisfied: 
=) gaan 
Y= AY (B17) 


If (B.17) is satisfied for some pseudo-inverse al, then it is satisfied 
1 


for every pseudo-inverse of A and x = A'y is a solution of (B.5) for every 


pseudo=inverse of A. 


Proof: i.,(if), Let at be any pseudo-inverse for which (B.17) holds. 
Then at y is a solution of (B.5). 


ii.,(only if). Suppose x, is a solution of (B.5). Then, 


y = Ax, = Adtax, = aaly 
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for every pseudojinverse of A. 
iii., Let A? be any pseudo-inverse for which (B.17) is satisfied and 


let af be any pseudo-jinverse for which (B.17) is not satisfied. Then, 


yt ast y=aat ant y = aval y 


which is a contradiction. Hence, if (B.17) holds for some pseudo= in- 
verse, then it holds for every pseudowinverse and ATy is a solution of 
(B.5) for every pseudo~inverse of A. 

Let us return to the problem of finding the solution of (B.5) 
having me smallest norm, We are faced with the problem of determining 
a Lagrange multiplier ) which satisfies (B.8). Such a Lagrange multi- 


plier will exist by virtue of the theorem, if and only if, 
y = AA (Aat) ty (B.18) 


Noting that at(aa')t is a pseudo-inverse of A, it follows from the theorem 
that (B.18) will be satisfied and (B.8) will always have a solution if 
(B.5) has a solution, which was the original hypothesis. Hence, we may 


use any (AA')T to obtain 
d= (aayty 
and finally, using (B.6), the smallest solution of (B.5) is given by 
x= A'(aA') Ty (B19) 


This result differs from (B.9) in that we did not have to assume 
the existence of (ant)! It is interesting to observe that At(AA')T 
satisfies (B.3) and (B.4) as well as (B.1). 


If we return to the problem of finding the least squares approx- 
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imation, we are faced with finding an x which satisfies (B.10). Such 


an x will always exist and we may use any (ata)T to obtain 
x = (ata) Taty (B.20) 


which is similar to (B.10). We observe that (ata) tat satisfies (B.2) 


and (B.3) as well as (B.1). 
BY The Generalized Inverse 


The generalized inverse of a matrix A is the unique matrix sat-~ 
isfying (B.1) through (B.4). In the previous section we observed that 
At(aat)t and (AtA)TA! each satisfied three out of the four of these re- 
lations, Hence, it is not surprising that the generalized inverse may 
be written in forms resembling these expressions, 

We first consider the case in which A is symmetric, Then A may 
be reduced by an orthogonal transformation to a diagonal matrix having 


the characteristic roots of A on the diagonal. 
DAT = AL = diag(Ay 00054) 


We form At from J. by replacing 4, by 1/),, for ds #o, and leaving 


unchanged the terms of /\_ which are equal to zero, Then, 


shoe At (B21) 


aap ome i Gui 


It is simple to verify that at given by (B.21) satisfies (B.1) through 
(B.4). Since AA' and A'A are symmetric, we may find CAat)t and (atayt 
in this manner, One can easily verify that the following two expressions 
satisfy (B.1) through (B.4) and hence, are equivalent expressions for the 


generalized inverse of A; 
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Ar (at )® (B,22) 
(ata)tat (B.23) 


It is clear that the generalized inverse is a particular pseudo- 
inverse and hence, may be used to obtain solutions to linear equations 
when solutions exist. The generalized inverse, in addition, has the 
two important properties of always giving the smallest solution when a 
solution exists and giving the best least squares approximation when no 
Solution exists, These properties are summarized in the following theorem 


given by Penrose [51] which we state without proof, 


Theorem: Let|| A||= trace A'A. X, is said to be the best approximate 


Solution to the matrix equation AX = Y, if, for all X, either 


' 


Jax-x¥] > |] ax -x| 
or = J AK - Yi] = | AX - Yi and WX t= XI 
— oo ame() — X, || 


Then aty is the unique best approximate solution to the equation AX = Y,. 


An alternate expression for the generalized inverse may be obtained 
in the following manner, Reconsider the problem of finding a best least 
Squares approximation to the solution of the equation Ax = y. We found 
previously that any x satisfying A'Ax = A'y would do, but we shall now 
seek the smallest x satisfying this relation, We minimize the expression 


2 
Six] + At(Atax = Aty) (B24) 


Setting the partial derivatives of this expression, with respect to x 


and \ » equal to zero we obtain the following two equations; 
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AtAx = Aly (B,10) 


x= A'A ) (B25) 


A'AATA A = Aly 
Using a pseudo-inverse to solve for ), we obtain 


= (A'AA'A) ALY (B.26) 


Combining (B.25) and (B.26) yields 


x = A'A(ATAA'A)TAty (B.27) 


and hence, is an expression for at, In verifying this fact it is help- 


at = ara(ataataytar (3.28) 
at = Al (AA'AA®)TAAS (B.29) 


Suggestions for computing the generalized inverse are given in [6, Zee 


51]. 


_A Geometrical Interpretation of the Generalized Inverse 





In order to give the generalized inverse a simple interpretation 


in terms of orthogonal projections we will need some basic concepts about 


aos 





linear transformations and vector spaces such as may be found in| 2]. 
We will view the m by n matrix A as a set of column vectors 
A4,e00ran o If the equation y = Ax is written in the following forn, 


emphasizing this point of view, 


; | at 

a, Aap 20 an : Se 

, \ J 
bee 


it is clear that the equation will have a solution if and only if, y 
is a linear combination of the column vectors of A; in which case, y 
is said to belong to the range of A, ye R(A). In any case, we may de- 
compose y into the sum y = y, + y¥,, where y,« R(A) and Yo is orthogonal 
to the column vectors of A. A vector y, with this property is said to 
belong to the null space of A', Yoé N(A'), since ee 20 R(A) and N(A') 
are called orthogonal complementary subspaces of m dimensional vector 
space, Ee orthogonal, since any member of one is orthogonal to any 
member of the other ~- complementary, since their union is EB” -. subspaces, 
since any linear combination of vectors belonging to R(A) (or N(A')) also 
belongs to R(A) (or N(A')). The intersection of R(A) and N(A') is the 
zero vector. 

Suppose that S and T are orthogonal complementary subspaces of 
E", Consider a transformation © eon E' such that if we E" and w=uty, 


where uéS and véT, then 
Py =u 


Such a transformation P is called the orthogonal projection of E' onto S 
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Necessary and sufficient conditions that P be an orthogonal pro- 


jection are 
P =P 


pt = Pp 


We use the notation FR(A) to signify the orthogonal projection onto R(A). 
Let us turn our Retin to the diagram of Fig. B=l. This figure 

should be interpreted in the following manner; any point in the lower 

plane (E”) is mapped in the direction of the arrow by the transformation 


A to a point in R(A) in the upper plane (E”), 





Fig. Bo 





Consider the equation Ax = y, where y is shown in Fig. Bel. 
This equation has no solution since y¢R(A). The point nearest to y 
for which this equation has a solution is y,, where y, = PR aye If we 
now consider instead the equation Ax = y,, we see that any point x on 
the line of Fig. Bel marked Ax = y, is a solution of this equation. 
From these solutions we choose X4 » the one closest to the origin. Note 
that x, € R(A! ).- We would like to have a transformation at which would 


perform these operations. Then X, would be given by 


ee Aty (B.30) 


tt 


Let us consider the properties such an would possess, First, 
R(AT)c R(A') since every point y would be mapped by at into a point 
belonging to R(A'). Second, if X, is to be given by (B.30), we would 


require Ax, to be Pacajee THaG iS, 10 (ail 


aay Ea(aye 
or 

Aa = Baca) (B31) 
Third, if Ax = Y,» then aty. would be Bacar) That is, for all x 

A tox ~ Bey at) * 
or 

At A = Pp (Ba32) 
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The condition (B.32) implies that R(A')C R(At) since, if x<R(A'), 
then at ax = x and x eR(AT), We now have the relation R(AR)C R(A') and 


R(A')C R(AT) which implies R(A') = R(A"), ‘Thus, (B.32) becomes 


afa =P 
R(At) (B.33) 


The condition (B.31) is equivalent to (B.1) and (B.2), and the condition 
(B.33) is equivalent to (B.3) and (B.4). Hence, all the properties of 
the generalized inverse follow from (B.31) and (B.33) and are easily 


given a geometric interpretation as is done in Fig. Bel. 


B6 Further Discussion of the Equation Ax = y 


In this section we shall prove three simple lemmas which will be 


used repeatedly in Chapter IV. We again consider the linear equation 

Ax = ¥ (B. 34) 
where y and A are known and x is to be determined, 
B6.1 Lemma; 


The solution of (B.34) is unique if and only if A'A is positive 


definite, If A'A is positive definite the unique solution is 


= ata} laty (B.35) 


Ir 


Proof: Let x, and x, be any two solutions of (B.34). Then 


jo 


=A[xX = X | 


and 
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feqets5] AMA [x =x] = 0 (B.36) 


If A'A is positive definite (B.36) implies X, = X, and the solution of 
(B.34) is unique. If A'A is not positive definite, let Xx, be any non~- 
zero vector such that oes A' AX, =o, (1i.e., X» € N(A)). If X is a solution 
of (B.34) then so is x, +X, and the solution of (B.34) is not unique, 


oo | 


The solution is given by (B.35) since [A'A] A' is the generalized inverse 


or A, 
B6.2 Lemma: 


A solution of (B.34) will exist if and only if y € R(AA'); in which 


case the solution of smallest Euclidian norm is given by 
x = A'(aa' Fy (B.37) 


Proof: Using the theorem of section B3 and recalling that A'(AA')* = af, 


we observe that (B.34) will have a solution if and only if 
y = Aaty = gat (aat Fy 


Hence, (B.34) will have a solution only if y < R(AA'). Moreover, by the 


same theorem, y = AA') has a solution, (i.e., y € R(AA'), if and only if 
y = AA'(AA' )Ry 
The solution of smallest Euclidian norm is given by (B.37) since A'(AA' ft = af, 


BO.3 __Lemma: 


A solution of (B.34) will exist for all y if and only if AA' is 
positive definite: in which case the solution of smallest Euclidian norm 


is given by 
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x = AN (AA) "ly 


Proof: A solution of (B.34) will exist for all y if and only if all 
y €R(A), or equivalently if and only if N(A') = 0. But a vector Se 
belongs to N(A') if and only if Yo AA'y = 0, Thus, N(A') = 9 if and 


only if AA' is positive definite, 
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APPENDIX C 
GAUSSIAN RANDOM VECTORS 


In this appendix we summarize some known facts about Gaussian 
random vectors or, equivalently, about the multivariate normal dis~ 
tribution, Special attention is given to situations in which the 
covariance matrix is singular, The pseudo-inverse and generalized 
inverse of Appendix B are shown to be applicable to these situations, 

If x is an n-dimensional random vector with mean m and covariance 
matrix V=E | (x-m)(x-m)'] , we shall say that x is a Gaussian random 


vector if its characteristic function is of the following forn; 
C, (iw) = § | exp iwix | = exp swim - $s wv | (C.1) 


If V is non-singular (positive definite) the probability density function 


for x takes the following form; 


Z 
f(x) = ny? Ty)t exp [- $lzm I | 


(C.2) 


The definition of a Gaussian random vector in terms of the 
characteristic function has been suggested | 29] in order to avoid the 
difficulties which arise when V is singular, The results to be presented 
in this appendix have been derived elsewhere [29] using a different 
approach based on the characteristic function. We shall use an approach 
which is more in keeping with the body of this work. 

A vector u will be called a normalized Gaussian random vector if 


it is Gaussian with mean E(u) = o and covariance matrix E(uu') = I. Any 
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Gaussian random vector x can be considered as having been obtained from 
a normalized Gaussian random vector of the same dimension by a linear 


transformation of the following form: 


xom = Tu 


Then, 
E(x) =m+]2 E(u) =n 
E {(xem)(xem')} = DE(uu'!) I = I = vy 


It is shown in Appendix A that a non-negative definite matrix V may 
always be factored into a product TTI’. It will suffice for our purpose 
to know that the transformation T exists, since we shall never have to 
compute T, | 

Let x, and X> be Gaussian random vectors having the joint eae 


bution specified by 


x =) ia mee 
Xx = m = V = 
2 = Yo, top 


where V may be Singular, 
We consider x to be the result of a transformation on a norm- 


alized Gaussian vector u. Then, 


} 


X=m = Tu mee = V (C.3) 


Let H be the matrix [0 I ]so that X, = Hx. Suppose that an experiment 
is performed in which it is learned that the value of x, is z. Having 


learned the value of x, we want to lmow the a posteriori distribution 
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for x,. The a posteriori distribution for x, will be Gaussian and 
hence, will be specified by its mean and its covariance matrix, To 


determine the a posteriori mean we minimize 
a 
2] + A'(z-Hm-HTu) (C.4) 


That is, we locate the mode (or mean) of the a posteriori distribution 
for u. The Lagrange multiplier has been introduced to insure that X, 
takes on the observed value z, Minimizing (C.4) is exactly the same 
type of problem as was discussed in Appendix B, Proceeding as before, 
we set the partial derivatives, with respect to u and ), of (C.4) equal 


to zero to obtain the following two equations for the minimizing value u; 


HTu = zeHm Cc 
& = T'H') (C6) 


Combining (C.5) and (C.6) we obtain 


HTT'H' ) = z-Hm (6.7) 


As was shown in Appendix B, a \ satisfying (C.7) will always exist if 
there is a u which satisfies (C.5). There will be a u which satisfies 
(C.5) unless the observed value of x, has probability zero; that is, 
unless the singularity of the a priori covariance matrix V makes the 
observed value inconsistent with prior knowledge, We rule out this 
possibility since it leads to division by zero in Bayes! Theorem and 
violates our basic assumption, Values of Xx» for which a solution to 
(C.5) exists will be called consistent, Substituting for Hands Tt! 


in (C.7) yields 
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V55 d = Z-n, (C.8) 
Any pseudo-inverse may be used to solve for \}. Then, 
a 
“ Co 


Combining (C.9) and (C.6) gives the following equation; 


Finally, using (C.3), we obtain 
Sam = VH'V!. (zem,) (c.10) 
AB © 2S £22\2-22 


This is equivalent to the two equations 


m, + Wy alna(2Be) (Gs1t) 


X> = Mo t Vo Vo (2=tty) (C.12) 


From the pseudo~inverse theorem of Appendix B we know that the existence 


of a solution of (C.8) implies the following relation; 


Za 


m, = ViVoo(z -M,) (C.13) 


Combining (C.12) and (C.13) leads us to the consistent result Xo = 2 


The basic result is (C.11) and is summarized in the following statement. 


Statement C-1; The conditional expectation of X,, given a consistent Xo 


is given by the following relation; 


T 
1 = 8, + Vi Vom 


We may obtain a useful relation from (C.13) by using the fact 


that for all consistent values of x, the following relation is satisfied; 
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Xo-My = VooV Vo (%p-By (6.14) 


Postmultiplying both sides of (C.14) by (x,-m, )! and averaging with re- 


spect to the joint a priori distribution yields 


aE 
Voy = Lop¥ooby (C.15) 


Taking the transpose of (C.15) gives the following useful relation; 


oT 
Yin = Viakoatos oe) 


In the development of an expression for the covariance matrix 
of the a posteriori distribution for x,, given x,, the concept of pre- 
posterior analysis [ A] will be used, Given any value of x» we may find 
the a posteriori distribution for x, which will, in general, depend on 
the value Of Xyo 
posteriori distribution may be considered to be random variables, Pre- 


Before the value of x, is known, parameters of the a 


posterior analysis is concerned with the a priori probability of obtaining 
particular values for parameters of the a posteriori distribytion. We 
shall need the following simple lemna; 


Lemma: Let E [ P(x, )] be the expected value of some function B(x, ) 
De ie 
eles 


with respect to the a posteriori distribution, Then the expected value 


of : ' [ G(x, )] with respect to the a priori distribution for x, is the 
=11=2 


a priori expected value of $(x,). That is, 
| z ae = E[6(x,)} 
X, 


Xo L X1X 


The proof is by inspection since on the left side of this expression 
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the average is with respect to the joint a priori distribution for x, 
and Xn 

Using this lemma we may easily prove the following theorem given 
in [54]. 
Theorem; The expected value with respect to X> of the covariance 
matrix of the a posteriori distribution plus the covariance matrix 
with respect to x, of the mean of the a posteriori distribution is equal 


to the covariance matrix of the a priori distribution for x,. That is, 


+ £E B. (#) E(x.) Al = gE Bese) E E iG 
Bea ee a leis, x, [ele 
= E[xx'] - E[x,] [E [x]17' 
oe 7 Pu 


The proof is immediate by canceling like terms and applying the lemma, 
This theorem is sometimes stated for scalar random variables in the 
following concise form, The mean of the posterior variance plus the 


variance of the posterior mean is equal to the prior variance, 


Statement C=2: The covariance matrix with respect to x, of the mean of 
sgh 543 : i T ‘ 
the a posteriori distribution for x, is Le This statement 


follows from C-1 since 


Wey wey 


E [&-mQem)' = y 
- [ , Ns, 1 Eu calemee recat 


=)? 


Using (C.15) or (C.16) concludes the proof, 
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Statement Cu3: x, - & (X, ) is independent of x x, 
one 


To show this we compute the covariance matrix of this quantity 


and ae From Statement C~1 we have 


E + - E  (x))(x-m,)"] 


= E {[x,-2, - =» V vi (x =m )] (xX =H, se v= vy 
a re eee ane EWE Be 22922 
=| sae 


This quantity is equal to 0 by (C.16). 
This statement implies that the covariance matrix of the a poster. 
tori distribution for x, is independent of the observed value of X, and 


hence, the expected value with respect to x, of this quantity is equal 
to the quantity itself, Using this fact and the theorem leads to the 


following statement, 


Statement C-4: The covariance matrix of the a posteriori distribution 
fe 
tens: Xy given Xx, is V, = Vee re 
The generalized inverse may, of course, always be used in place 
of the pseudo-inverse, The use of the pseudo-inverse has been consistent 


with our effort to use a notation compatible with | 29 | » in practice one 


might prefer to use the unique generalized inverse, 
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APPENDIX D 
GENERAL SOLUTION OF THE DISCRETE-TIME LINEAR ESTIMATION PROBLEM 
D4 Introduction 


This appendix is devoted to a study of a very general form of 
the linear discrete-time estimation problem in which the covariance 
matrices may be singular, For the sake of generality and at the sacrifice 
of some simplicity, we allow correlation between the random none w and 
the measurement noise v. Necessary background material appears in Chapter 


II and Appendices B and C. 

De2 Problem Statement 
Consider the system 
x(k+1) = F(k)x(k) + G(k)w(k) (D.1) 
2(k) = H(k)x(k) + v(k) (D.2) 


where x is the state vector and z is the observation. { w(k)}, the random 


* 


input and { v(k)}, the measurement noise, are sequences of independent ran- 


dom variables with zero means and the following covariance matrices; 


E {u(5) w'(k)} = Qk) 55, 
EB [v(j) v'(k)] = R(k) Ss for all intergers j and k 
E[m) w'k)] = S(k) J 5, 


Based on a finite observation sequence | z(0),...,2(n)}, we seek an 


estimate of the sequence of states {%(0),005,x(ntm)} . 
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D3 Filtering and Prediction 


Suppose that, given the observed sequence { 2(0),.0«.2(k~1)§, 
the distribution for x(k) has mean x(klk=1) and covariance matrix P(k). 


Consider the vector 


a baa | ae | 
= with mean 
z(k) H(k)x(k) + v(k) H(k)x(k|k=1 ) 


and having the following covariance matrix; 


P(K) P(k)H? (ic) 


H(k)P(k) H(k)P(k)H'(k) + R(k) 


If z(k) is observed, we may use statements (C-1) and (C-4) from 
Appendix C to obtain the following expressions for the mean X(klk) 


and covariance matrix C(k) of the distribution for x(k) given { z(0),...,2(k)§; 
£(klk) = £(k |ket) + P(K)HY (k) [HC )PC)H* (k) + R(«)) " 
[ 2(k)-H(k)X(k1k=1 )] (D.3) 
G(k) = P(k)=-P(ic)H! (k) [H(k)PCc)H" Ck) + R(K)]? HCPC) (D./+) 
Similarly, consider the vector 
x(k+1 ) F(k)x(k) + g(c)w(k) F(k)&(k|k+1 ) 
ra | : oa + x(k) fs — on 7 
and having the following covariance matrix; 


F(k)P(kK)F'(k) + G(k)Q(k)G! (k) F(k)P(k)H' (k) + G(k)S(k) 
H(k)P(k)F' (k) + S'(k)G! (k) H(k)P(k)H'(k) + R(k) 
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If z(k) is observed we may again use statements (C-1) and (C-4) to 
obtain the following expressions for the mean x(k+1 |k) and covariance 


matrix P(k+1) of the distribution for x(k+1) given { 2(0),000.z(k)}; 
x(k #1] k) = E(k)X(ke] kot) + [E(k)PCc)H" (ik) + G(k)S (k)] 
[ H(c)P(k)H! (ik) + BCk)]* [2k )mH (ic) &(ie1ket )] (D.5) 
P(kH1) = F(k)P(k)F!(k) + G(k)Q(k)G! (k) 
~ [ F(«)P(K)E' (k) + G(k)S(k)]} [HCK)ECK)E' (Kk) + R(K)]? 
[ H(k)P(kK)F'(k) + S*(k)G*(k)] (D.6) 


With practically no effort we have obtained the solution to the 
linear filtering problem and the solution to the problem of predicting 
one step ahead. Equations (D.5) and (D.6) are, except for slight details, 
the same ones as those given in[29]. 

Note that x(k#i|k) is not simply an extrapolation of £(k|k). 

This is due to the fact that information about w(k) is available in 
z(k) because w(k) and v(k) are correlated, However, X(k+2 |k) and pres 
dictions further into the future are simply extrapolations of x(k+1 lk), 
Since no information is available about future values of w. Since it 
is now evident how to make predictions, we may confine ourselves to the 


problem of estimating } x(0),0o.,x(n+1){ given { 2(0),.00,2(n)}. 
DY __ Preliminary Considerations 
Let r(k) = G(k)w(k), and consider the vector 


x(k) G(k)Q(kK)G'(k)  G(k)S(k) 
with covariance matrix 
v(k) S!'(k)G!' (k) R(k) 
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We consider the vector formed by combining r and v to have been 


obtained by a transformation on a normalized vector u(k), E[u(j)u'(k)] =1 5 sk° 


r(k) G(k)Q(k)G'(k) G(k)S(k) 
ny | 7 Reads ROE) 
we 


S'(k)G" (k) R(k) 


Let us introduce the matrices V = [0,I ] and W = [1,0] such that 


r(k) = WI(k)u(k) (D.7) 
v(k) = VE(k)a(k) (D.8) 
WE(k)I' (k)W' = G(k)Q(k)G" (k) Do) 
VI(k)2"(k)V! = R(k) (D.10) 


W2(k)I'(k)V' = G(k)S(k) (D.11) 


The a priori distribution for the initial state is Gaussian with 
mean m and covariance matrix P(o) which may be singular, We shall con- 
Sider the initial state as having been obtained by a transformation ona 


normalized vector 9 


x(o) - m = AQ AA' = P(o) (D.12) 


D5 _ Derivation of the General Solution 


Introducing the Lagrange multiplier vectors @, d(k) and @(k), we 


may minimize the following expression; 


a n 
T7 74a +$'[x(o)-m-se] + 20 28k) [2 (c)aH(k)x(k)-VE(k)a(k)] 
=O 


2 
+ $ lle) i] + QU) [Cet =F (e)x(c)=WE(k)a(k)] | (D.13) 
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Setting equal to zero the partial derivatives of I, ,, with respect to u(k), 
Q(k), A(k) (for k=0,.00,n); x(k) (for k=o,...,nt1); and 6, yields the follow- 


ing relations; 


GaSe’ (D.14) 
@ = H'(0) @(o) + F'(o) Mo) (D.15) 
u = T'(k)V'@(k) + T'(k)W' A(k) (D.16) 
A(k=1) = Ft(k)A(k) + H'(k)2(k) K=1 ,o00.N (D.17) 
A(n) = 9 (D.18) 
x(o|n) =m + A@ (D.19) 
X(kH [n) = F(k)k(kin) + W2(kc)u(k) K=0y e005 (D.20) 
2(k) = H(k)x(k|n) + VP(k)u(k) (D.21) 


Combining (D.14), (D.15) and (D.19) and observing the AA' = P(o) yields 
x(oln) = m + P(o)H'(o)&(o) + P(o)E*(o)X(o) (D.22) 


Combining (D.i16) and (D,.20) and using (D.9) and (D.11), we obtain the 


following equation: 

x(k+1|n) = F(k)x(kln) + G(k)S(k)Q(k) + G(k)Q(k)G" (k)A(k) (D.23) 
Using (D.10), (D.11) and (D.16), we rewrite (D.21) in the following form; 

2(k) = H(k)x(k|[n) + R(k)Q(k) + S*(k)G'(k)A(k) (D.24) 


We now have a two-point boundary value problem given by (D.23) and 
(D.17) and the constraint (D.24). The boundary conditions are given by 


(D.18) and (D.22). 
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In order to simplify future equations we define the following 


quantities; 
B(k) = [H(k)P(k)H'(k) + R(k) }* (D.25) 
M(k) = B(k) [ K(k)P(k)F'(k) + S'(k)G"(k) | (D,26) 


We begin by considering the boundary condition for x(ofn). Com= 


bining (D.22) and (D.24) yields 
2(o) = H(o)m + [| H(o)P(o)H'(o) + R(o)}] (0) 
+ [H(o)P(o)F' (0) (0)G'(o)] A(o) (een) 
Using the generalized inverse to solve for a (0) yields 
2(o) = Blo) [2(o)-H(o)m] - M(o)A(o) (D.28) 


Note that any pseudo-inverse could have been used to solve for e(0). 
The generalized inverse is convenient because it allows us to define 


B(k) uniquely. Combining (D.28) and (D.22) yields 
X(oln) = m + P(o)H'(0)B(o) [2z(0)-H(o)m ] 
+ [ P(o)F'(o)=P(o)H"(0)M(o)] d(o) (D.29) 
or, equivalently, 
X(oln) = Z(0lo) +[ P(o)E'(0)-P(o)H' (0)M(o)] (0) (D.30) 


It is reassuring to note that for the special case S = 0, (D.30) reduces 


to the following familiar equation of Chapter II; 


X(o|n) = X(olo) + G(o)F*(0))(o) »S(o) = 0 (2.41) 


JAA 





We now hypothesize that the general solution is of the following 


form: 
x(k|n) = x(k|k) + [P(k)F'(k) = P(k)H'(k)M(k)] \(k) ,k=0,000,n (D231) 


A(ke1) =[ E'(k)=H' (k)M(k)} (k) + H!(k)B(k) [ 2(k)-H(k)x(kIk-1 ) | 
(Dea2) 
where x(klk), x(k#i|k) and P(k) are given by (D.3), (D.5) and (D.6). The 
hypothesis (D.32) is equivalent to requiring that @(k) be given by the 


following equation: 
Q(k) = Blk) [ 2(k)-H(k)x(k/ke1)] = M(k)A(K) (D.33) 


We have already shown that (D.31) and (D.33) are satisfied for 
k=o, To establish a proof by induction we need to show that if (D.31) 
and (D.33) are satisfied for some k € (0,¢60.,n=1), then (D.31) and (D.33) 
are satisfied for kt1. 

Suppose that (D.31) and (D.33) are satisfied for some k € (0,...,n=1); 


then from (D.23) we have the following relation; 
x(kHi[n) = F(k)x(klk) +[F(k)P(k)F'(k) = F(k)P(k)H' (k)M(k) 
+ G(k)Q(k)G'(k) = G(k)S(k)M(k) ] X(k) 
+ G(k)S(k)B(k) [ 2(k) = H(k)x(kIk~1)] (D. 34) 


Using (D.3), (D.5) and (D.6), we may rewrite (D.34) in the follow- 


ing form; 
x(k In) = x(k+11k) + P(k+) \(k) (D.35) 


Substituting for \(k) from (D.17) yields 


elze= 





x(k#11 n) = X(k-+1 | kk) + P(k+1 )H' (k+1)2(k+1) + P(k+i )F' (k+1 )ACk+1 ) (D.36) 
From (D.24) we may write 
2(k+1) = H(k+ )x(k-+1 | k) + [ H(k+1 )P(k+1 )H' (k+1) + R(k+1 ) @ (k+1 ) 


+[ H(k+1 )P(kH )F'(k+1) + S'(k+1)G"(k+1)] \(k+1) (D.37) 
Using the generalized inverse to solve for a e(k+1) we obtain the 


following relation which satisfies the hypothesis, (D.37); 
Q(k+1) = B(k+t1) [ 2(kH )-H(k+1 )x(k+11k)] - M(k+1) \(ic+1 ) 


Combining (D.38) and (D.36) and using (D.3), we obtain the following 


equation which satisfies the hypothesis (D.31) and completes the proof; 


x(k#H1n) = x (let |k+1) +[ P(k+1 )F!' (k+1 )=P(k+1 )H' (k+t )M(k+1 ) | A(k+1) 
(D.39) 


We may summarize these results in the following theorem: 


Theorem: The general solution of the linear estimation problem as 
specified by the two-point boundary value problem (D.23), (D.17), (D.24), 


(D.18) and (D.22) is given by the following recurrence relations: 


x(kH Ik) = F(k)x(klket) + M'(k) [ 2(k)=H(k)x(kl ket) ] (D.40) 
P(kH) = F(k)P(c)F! (Ie) + G(ik)Q(1e)G! (ic)=M! ()B* (1e)M (Ke) (D6) 
x(klk) = X(k|k-1) + P(k)H' (k)B(k) [ z(k)=H(k)x(Kk1k=1) ] (D.42) 


x(kIn) = x(k|k) + { P(k)F'(k)=P(k)H! (k)M(k)]} \(k) k=0,000,n (D.43) 


Miet) =[E'(k)=H! (ie)M(e)] X(ic) + HY (BC) [ 2 (ke) aH (ic) RC thet )] (D-H) 


k=1 , o's’o 5M 


Ae 





where B(k) and M(k) are defined in (D.25) and (D.26), The recursion 
begins by replacing x(k k~1 ) for k=o by m in (D.41) and (D.42). Pre- 


A 
dictions are extrapolations of x(n+1/n); for example, 


X(n#2|n) F(n+1 )x(n+1 {n) 


X(nt3| n) = F(nt2)X(n+2\|n) (D645) 


For the special case S(k) = 0 for all k (D.43) becomes 

x(kin) = x(klk) + C(k)F" (k)\(k) (D.46) 
where C(k) is given by (D.4),and (D.44) beomes 

\(ket) = f THM) [ HOCBOCH' Ck) + B(c)] "HCE OK) } Bt (ie) MC) 


+ HY(k) [ H(k)B(K)H!(k) + RO)] [ 2(kc)-H(k)S (kl ket )] 
(D.47) 
Discussion; Equations (D.40), (D.41) and (D.42) are simply other forms 
of (D.5), (D.6) and (D.3) respectively. The generalized inverse may be 
replaced by any pseudo-inverse if (D.6) is used in place of (D.41). 

The calculation procedure is to use (D.40), (D.41) and (D,42) to 
obtain x(kHilk) and #(klk) for k~o,...,n. Then calculate \(k) by back- 
ward recursion of (D.44). Once A(k) is obtained, (D.43) may be used to 
obtain x(kin) for all k € (0,e00,N)« 

If the measurement noise is not independent, i.e., E[v(3)v' (k)] 4 } 3,R(Kk) 
we may consider this noise to be the output of a linear system excited 
by white noise and adjoin the states of this fictitious linear system to 
the basic process, There is no need to assume that F(k) is non-singular, 

We can express the solution of the linear smoothing problem in a 


Simpler form by writing the expressions in terms of X(k|k-1). Com- 


Es 





bining (D.42) and (D.43) yields 
X(kIn) = X(klk=1) + P(k)B'(k)B(k) [ 2(k)=H(k)x(k|ke1) ] 
+ P(k) [ F'(k)-H'(k)M(k)] )(k) 


This expression may be combined with (D.44) to obtain the following 


equation; which is a restatement of (D.35)>: 
X(kIn) = X(klk=1) + P(k)\(ke1) (D.48) 
We may summarize this result in the following corollary; 


Corollary: The solution of the linear smoothing problem is given by 
(D.48), (D.40), (D.41) and (D.44). (D.44) is now valid for k=0,000,N. 
The boundary condition )\(n) = o remains in effect. The a priori mean 


mis to be used for X(ol=1) in these expressions, 
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APPENDIX E 
PROBABILITY DENSITY FUNCTIONALS FOR GAUSSIAN RANDOM PROCESSES 


In this appendix we shall given an introductory discussion of 
probability density functionals. Our treatment will be formal, since 
a rigorous discussion of "white noise" and probability density functionals 
is beyond the scope of this report. The concept of a probability density 
functional was introduced by Preston | 53] and was extended by Thomas and 
Zadeh [ 62] to include vector Gaussian processes, For a more abstract 
treatment the reader is referred to the work of Parzen [72]. 

Consider a vector random process | x(t){, for o £ t = T, consisting 
of a sequence of steps occurring at intervals T/N apart. Assume that the 
magnitude of the steps is normally distributed with zero mean and the 


1 


following correlation (covariance) matrix; 
B [x(it/N)x'(j7/N)] = R(iT/N,jT/N) ig ode) Ga 


A sample function of a scalar version of such a process is shown in Fig. 


Hee | 
x(t) 
O t 
i 2a ese T 
N 


Fig. E-1 
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The probability density function for the sequence of points 


{x(T/N) ,x(2T/N),ooesesx(T)} is 


p{x(T/N),.oe.x(T)] = C' exp ~ 4 , x! (iT/N)H(iT/N, JT/N)x(jT/N) 
l= j= 
(E.2) 
where C is a constant of proportionality and where H and R are related 


by the following expression; 


N 
2 R(iT/N,kT/N)H(kT/N,jT/N) = L $3; (E.3) 


Let us define a function, G(iT/N,jT/n), as follows; 


a 


G(47/N,§T/N) = H(AT/N,jT/N) [n/t] (B4) 


4 


Then (E.2) may be rewritten in the following form: 


N 
Palec(l Ngee x(l)| = Cr expr < = 5 >, x'(4T/N)G(4T/N, JT/N)x(5T/N) in 


el gael 
(ie5) 
We may also rewrite (E.3) as 
ROTI KE/N)G (E/N, 3t/N) [2/N] Cy/0] 
R(iT/N,kT/N)G(KT/N, jT/N T/N = |] : N/T 
ps ih 1 u J ~ éa5 (E.6) 


If we take the limit as N->© the resulting process {x(t){ is a 
Gaussian random process, A random process | x(t)] is said to be Gaussian 
if the joint distribution for every finite set of points ‘ x(t, ee X(t.) } 
is normal. 


In the limit as N-=~ , (E.6) becomes 


tT 7 





t R(t, »t)G(t, to dt = 1 § (t,-t,), (o = t, st, < T) 

0 (E.7) 
A matrix kernel G(t,t,) satisfying (E.7) is said to be inverse to the 
correlation matrix R(t, ,t). 

If we were to take the limit of (E.5) as N7© , the constant 
of proportionality C would become infinite. To avoid this difficulty 
we define the probability density functional as a ratio, using a definition 
sande to that given by Thomas and Zadeh [ 62] o Let z (o,T} be a vector of 
curves, Let h be a small positive scalar and let h be a vector, each 
element of which is equal to h. Then, the probability density functional 
of the process { x(t) { on the interval lo,T | is defined as follows; 


Px [2jon]] = 2a Pe Lattin <xtt) < atthe tor 0 <t <7] 


Pr [wh < x(t) = h, foro#té T | (E.8) 


= exp (- 4 r ns 2" (by JG(ty »tp z(t) dty dt, 
O O (E.9) 
Note the similarity between (E.5) and (E.9). 
White Gaussain noise is defined to be the formal limit of a 
Gaussian process as the values of { x(t) {at different instants of time 
become statistically independent, In this case R(t, »t.) becomes 


R(t, ) § (t,-t,) and G(t, rt) becomes Pe (t,) J (t,-t5)- Then (E.9) becomes 


toe 


- 
Px [2po,q]] = OP) fo atcepert(tyect) at 
= |*[o,T] 0 (E.10) 


tole 


= exp = 


T Z 
J | 2(t) |}, at 
O R(t) (E.11) 
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Probability density functionals are a powerful tool when used for 
problems involving Gaussian random processes on a finite time interval 


and will be used in Chapter III of this report. 
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